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A Method For Predicting Protein Structure 

1. Introduction 

The present invention relates to methods of 
predicting the tendency of a portion of a protein to 
form amphiphilic a or p structure. 

5 

2- Backgrou nd Of The Invention 

2 « 1 * Methods For De termining Protein Structure 
Several algorithms are currently used to evaluate 
10 the secondary structure of proteins, including the 

Kyte-Doolittle, Chou-Fasman-Prevelige, and PHD methods. 

The Kyte-Doolittle method (Kyte and Doolittle, 
1982, J. Mol. Biol. 157: 105-132) evaluates the 
hydrophobicity and hydrophilicity of each amino acid, 
15 as they appear sequentially in a protein. The program 
then uses a continuous moving segment approach that 
determines the average hydropathy within a 
predetermined segment. Although the program can 
accurately predict interior and exterior regions of 
20 soluble globular proteins, data on membrane spanning 
regions of transmembrane proteins is more ambiguous. 

The Chou-Fasman-Prevelige (CFP) algorithm 
(Prevelige and Fasman, 1989, in "Predictions of Protein 
structure and the Principles of Protein Conformation", 
25 Fasman, ed. , Plenum Press, New York, pp. 391-416) uses 
a statistical approach to the study of protein 
secondary structur . The conformational parameters for 
each amino acid are calculated using the relative 
frequency of a given amino acid within a protein, its 
30 occurence in a given type of secondary structur , and 
the fraction of residues occuring in that type of 
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structure. Sine th s paramet rs (such as 
hydrophobicity) contain information about protein 
stability, properly weighted for their relative 
importance, they are useful for predicting secondary 
5 structures. These parameters, represented by Pa and PP 
or Pc (for a-helix, p-sheets or coils, respectively) 
are utilized to locate nucleation sites within an amino 
acid sequence. These nucleation sites are then extended 
until a stretch unlikely to belong to that structure is 
10 encountered, whereupon that structure is terminated. 

This process is repeated throughout the sequence until 
the secondary structure of the entire sequence is 
predicted. 

The PHD method (Rost and Sander, 1992, Nature 26£: 
15 540) utilizes a combination of evolutionary and 

multiple sequence alignment information, and a "jury" 
of 12 networks. Since this method is a fully automated 
computer program, it is independent of human input or 
interpretation and as such delivers a unique approach. 



20 



2 . 2 . Rtmeture Of Glucose Transport proteins 
Mammalian glucose transporter proteins (GLUTS) 
constitute a family of proteins which are integrally 
embedded in the cell membrane and primarily transport 
25 glucose into and out of cells. Recent evidence 

indicates that compounds other than glucose, for 
example, water, dehydroascorbic acid and nicotinamide, 
can traverse GLUTs suggesting that these proteins may 
be multifunctional. 
30 For example, glucose transporter proteins have 

recently been shown to exhibit a modest permeability to 
wat r (Fischbarg t al. , 1990, Proc. Natl. Acad. Sci. 
USA, fil: 3244-3247), suggesting that th r is a channel 
in glucose transporter proteins that is hydrated and 
may serve as a conduit for the substrates mentioned in 
the paragraph above. Furth r, GLUT proteins may play a 



35 
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role in the pathogenesis of diab tes, in that insulin 
elicits a specific and rapid response from GLUT 
proteins in human muscle and fat cells where a rapid 
translocation of GLUT from an internal storage pool to 
5 the plasma membrane occurs, thereby increasing the 

glucose uptake by these cells. In adipocytes, the Km 
for glucose may also be lowered as a response to 
insulin. 

The GLUT proteins have been well characterized 
10 biochemically and their primary structures have been 
determined. But as is the case with many membrane 
proteins, the secondary structures of GLUTs are largely 
unknown, greatly hindering any study of their molecular 
mechanisms . 

15 The hitherto most favored model of GLUT secondary 

structure predicts that GLUT proteins form 12 
transmembrane c-helices (12H model; Mueckler et al., 
1985, Science, 129:941-945). Further studies 
suggesting a high o-helical content include Chin et 
20 al., 1986, J. Biol. Chem. 261: 7101-7104 (Fourier 

transform infrared spectroscopy, FTIR) and Chin et al., 
1987, Proc. Natl. Acad. Sci. U.S.A. JB_i: 4113-4116 
(circular dichroiism, CD), other studies have suggested 
that extensive or-helical content is accompanied by 
25 significant p-folding (FTIR spectroscopy: Alvarez et 
al., 1987, J. Biol. Chem. 262: 2502-3509; CD: Park et 
al., 1992, Protein Science A; 1032-1049) , but have 
failed to appreciate the full extent of the p-structure 
predicted by the present invention. 
30 The 12H model indicates that the highly conserved 

seguence (He 386 - Ala 405) , in a particular GLUT 
protein, GLUT1, is intrac llular. However, recent 
experiments (Fischbarg et al., 1993, Proc. Natl. Acad. 
Sci. U.S.A. 20: 11658-11662) utilizing a synthetic 
35 polyclonal antibody to this conserved regi n showed 
that the antibody induced an increased glucose uptake 
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only when administ red extracellularly. This is 
inconsistent with the purported intracellular location 
of the region in the 12H model. These data, 
contradicting the established model, prompted further 
5 analysis of GLUT secondary structure using the novel 
algorithm of the invention and, as set forth below, led 
to the discovery of a new model for GLUT structure. 

3. summary O f T^e Invention 

10 The present invention relates to methods of 

predicting the tendency of a portion of a protein to 
form amphiphilic a or P structure. It is based, at 
least in part, on the discovery that porin membrane 
proteins, which were previously assumed to contain 

15 predominantly a amphiphilic structure, unexpectedly are 
predicted to contain substantial amounts of P 
structure . 

The methods of the present invention provide a 
number of advantages relative to methods previously 

20 used to analyze protein structure. For example, the CFP 
algorithm fails to consider hydrophobicity and 
amphiphilicity, and is more ambiguous in its 
predictions than the algorithms of the present 
invention. The CFP peaks are not fully representative 

25 of the actual protein structure, whereas the peaks seen 
by the Union program may provide a better visual 
representation of actual secondary structure. 

In particular embodiments, the methods of the 
invention may be used to predict the presence of p- 

30 barrel structures in membrane proteins. The prediction 
of such structures in the protein may then be used for 
the rational design or identification of compounds that 
may interact with th pr tein. Alternativ ly, the 
methods of the invention may b used to create p-barrel 

35 structures in genetically ngin ered prot ins. 
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4. Descrintion Of Th Pf T ir? c 

FIGURE 1 (D.E.H. and I) Data represent averaged 
values from two 10-oocyte groups: other data are 
averages from three such groups. Individual values 
5 differed with each other by <20%. For 60 min before 
the uptake assay, one group of oocyte (intracellular 
Ab, solid bars) was injected with 20-30 nl of a 
solution containing either Ab-1, Ab-4, or Ab-c (1 ng of 
Ab per l nl of water) . A second group of oocytes 
10 (extracellular Ab, shaded bars) was incubated for 60 

min in MBS containing the same ABs before measuring 3 H- 
DOG uptake. Control oocytes (open bars) were incubated 
in MBS. (D) Oocytes were incubated for 60 min with Ab 
in the outside incubation medium; the Ab concentration 
15 was varied as indicated. Solid circles, Ab-c; open 

circles (controls) Ab-4. (E) Oocytes incubated with Ab- 
c plus the addition of various concentrations of a 
peptide. The following peptides were used: solid 
circles, the conserved peptide Ile-386-Ala-405; open 
20 circles, the last 20 amino acids at the C-terrainal end 
of GLUT4 (F) Oocytes incubated with Abs in the outside 
medium. Solid circles, Ab-c; open circles, Ab-4 (G) 
Open circles, oocytes incubated initially in medicum 
containing 1 fin insulin; arrow, the medium was replaced 
25 by another one containing insulin plus Ab-c (loo 

ng/ml) . Solid circles, Ab-c in the initial incubation 
medium. Ab-c plus insulin after the arrow (H and I) 
Lineweaver-Burk plots of 3 H-D0G uptake in oocytes 
expressing GLUTl and GLUT4, respectively, and incubated 
30 in the following media: open circles, MBS (controls); 
solid circles, MBS plus Ab-c (100 ng ml); triangles, 
MBS plus l fM insulin. 

FIGURE 2. Multipl sequence alignment of two 
porins (OmpF, SEQ ID N0:i and S16070, SEQ ID NO: 2) and 
35 GLUTl (SEQ ID NO:3). S16070 stands f r POR. 

Rectangles, existing (OmpF, POR) and predicted (GLUTl) 
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p-strands. Rounded rectangles, existing (OmpF, POR) 
and predicted (GLUT1) a-helices. 

FIGURE 3. Prediction of porin structures using 
Union. Area graphs, Up 7 prediction profiles. Structures 
5 known from crystallography (cryst.) or predicted (prd) 
are shown above the profiles in each case. 

FIGURE 4. Our prediction for GLUT1. From top 
down, prediction profiles of hydrophobicity; turn; and 
union propensity for amphipathic a-helices and p- 

10 strands, respectively. Spans: 4 for <pt>: others are 
indicated in label subindices. For comparison, 
predicted structures are shown at the top and bottom 
panels. For the 12H model and for our prediction 
symbols are shown angled so that their lower and higher 

15 ends correspond to their intra-and extracellular sides, 
respectively. 

FIGURE 5. Putative pB of GLUT1 viewed from inside 
the cell. A molecule of p-D-glucopyranose is shown in 
the center of the pore as a size marker (viewed 

20 from CI) 

FIGURE 6. Model of secondary structure of GLUT1 
(SEQ ID NO: 3). Putative 16 transmembrane p-strands are 
represented by rectangles. The more hydrophilic sides 
of the p-strands (presumably lining the pore) are 

25 facing right. In the extramembrane loops, triangles 
denote predicted turns, and rectangles mark predicted 
a-helices. Of the two possible N-linked glycosylation 
sites N 45 and N 411, mutagenesis points to the first 
(Asano et al., 1991, J. Biol. Chem. ££6:24632-24636). 

30 Two epitopes, 217-272 and 386-405 and a sugar binding 
site, Q 282 (Hashiramoto et al., 1992, J. Biol. Chem., 
267 ; 17502-17507), are boldface. 

FIGURE 7. Union profiles ar shaded. Cry: 
information from high-resolution structures, t: turns. 

35 (A) reaction center, L chain; (B) bacteriorhodopsin; 
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(C) colicin A; (D) Rhodobacter capsulatus porin; and 

(E) Escherichia coli porin." 

FIGURE 8. Arrows mark predicted p-strands. ( A ) 
facilitative glucose transporter l; ( B ) CHIP28; (C) 
acetylcholine receptor a-subunit; (D) lactose permease; 
(E) Na + /glucose cotransporter; (F) shaker K channel; 
(G) calcium ATPase (sarcoplasmic reticulum) ; and (H) 
H + /K + ATPase. in 8a, 4th panel, the dotted lines 
suggest the topological 
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orientation of pr dieted p-strands (intracellular at 
bottom), in 8d, panel 1, the alkaline phosphatase 
activity reported for the different fusions is super- 
imposed on the H 21 plot. Number labels identify the 

5 fusions. Also shown there is the 12H model for lac 
permease. As in 8a, a topological orientation is 
suggested (intracellular at bottom) . 

FIGURE 9. Zscores have been normalized (scoreN) 
for sequence length as in Park et al., 1992, Protein 

10 Sci. 1:1032-1049, namely: scoreN=scoreorig/C*[l- 
exp(A*sequence length +B) ] . For each of the two 
environments depicted in panels (a) and (b) , the same 
set of 400 randomly chosen globular proteins was run to 
generate a baseline distribution of raw scores vs. 

15 sequence length. 

5 . noi-^iiPd pes ^p^"" of Tn * invention 

For clarity of presentation, and not by way of 
limitation, the detailed description of the invention 
20 is divided into the following subsections: 

(i) proteins to which the inventive structural 
determination method may be applied; 
(ii) the Union algorithm; 
(iii) the UNION program; and 
25 (iv) the utility of the invention. 

5.1. Proteins To Which The Inventive Structural 

po-honni natio n w^hod Mav Be Applied 

The methods of the present invention may be 
30 applied to any protein, in order to determine the 

propensity of portions of the protein to form o and p 
structures . 

in preferred embodiments of the inv ntion, the 
methods are applied to membrane proteins, particularly 
35 proteins involv d in transporting compounds between the 
intracellular and the xtracellular compartments. For 
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example, and not by way of limitation, th methods of 
the present invention may be applied to the following 
proteins and to each member of their respective 
families: GLUT proteins (including but not limited to 
5 erythrocyte glycophorin) , bacterial porins (including 
OmpC, OmpF, NmpA, NmpB, NmpC and LamB, etc.)/ 
aquaporins, bacteriorhodopsin and the bacteriorhodopsin 
precursor, the reaction center L chain, colicin A , 
Rhodobacter capsulatus porin , and E. coli porin, the 

10 acetylcholine receptor a subunit, lac permease, sodium- 
glucose co-transporter, shaker potassium ion channel, 
sarcoplasmic reticulum calcium-ATPase, components of 
the sodium ion/ potassium ion pump, gap junction 
proteins, cytokine receptors, the multidrug resistance 

15 transporter, the cystic fibrosis conductance regulator 
and "band III" protein of the erythrocyte membrane. 

5.2. The Union Algorithm 

The present invention provides for a Union 

20 algorithm which is able to predict the presence of 
amphiphilic a and/or p structures in proteins, 
preferably membrane proteins, as set forth below. 

The present invention provides for a method of 
predicting the tendency of a portion of a protein to 

25 form an amphiphilic o structure, said portion having a 
span of x residues, wherein x is any integer, 
comprising calculating a value for using the 
equation V ax = H x + n ax - <pt>. H x is the average 
hydrophobicity for a span of x residues using the Kyte- 

30 Doolitte scale. n ax is the hydrophobic moment (span x) 
as calculated by the method set forth in Eisenberg et 
al., 1984, Proc. Natl. Acad. Sci. U.S.A. fii: 140-144, 
for a structures, the angle between on residue and the 
successive residue b ing that ass ciat d with a 

35 h lices, such as about 90-110°, and preferably 100°. 
<pt> is the positi n dependent turn propensity, as 
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calculated according to the m thod set forth in 
Prevelige and Fasman, 1989, in "Prediction of Protein 
Structure and the Principles of Protein Conf ormation" , 
Fasman, ed., Plenum Press, New York, pp. 391-416 
5 (assigned to residue 2 in a 4-point turn) . For 
example, a value of <pt> of a tetrapeptide is 
calculated as pt= fi x fi+1 x fi+2 x fi+3 when "i" is 
the residue and f= bend frequencies in the four 
positions of the ot-turn. 
10 The present invention also provides for a method 

of predicting the tendency of a portion of a protein to 
form an amphiphilic p structure, said portion having a 
span of x residues, wherein x is any integer, 
comprising calculating a value for C7 paf using the 
15 equation I7 px = H x + Mp x - <Pt>. » x is the average 

hydrophobicity for a span of x residues using the Kyte- 
Doolitte scale. Mp x is the hydrophobic moment (span x) 
as calculated by the method set forth in Eisenberg et 
al., 19B4, Proc. Natl. Acad. Sci. U.S.A. fil: 140-144, 
20 for p structures, the angle between one residue and the 
successive residue being that associated with p- 
structures, such as about 150-210°, preferably 160°. 
<pt> is the position dependent turn propensity, as 
calculated by the method set forth in Prevelige and 
25 Fasman, 1989, in "Prediction of Protein Structure and 
the Principles of Protein Conformation", Fasman, ed., 
Plenum Press, New York, pp. 391-416) (assigned to 
residue 2 in a 4-point turn). For example, a value of 
<pt> of a tetrapeptide is calculated as pt= fi x fi+1 x 
30 fi+2 x fi+3 when "i" is the residue and f- bend 
frequencies in the four positions of the p-turn. 

Odd number residues are generally chosen in 
assigning hydropathy values so that a given sum could 
be plotted above th mid-residue of the segment. In 
35 preferred, nonlimiting embodiments, the value of x is 
seven or twenty-one. These valu s are preferable 
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10 



15 



20 



25 



30 



35 



becaus a length f s ven residues represents the 
shortest span that can be reliably used with a minimum 
of localized "noise". A larger span of twenty-one 
residues was also chosen since this represents the 
average length of membrane spanning a-helices. 

Accordingly, the extent of a or p structure 
may be determined using the Union algorithm by 
calculating the values for U as set forth above, for a 
series of portions spanning the protein or a relevant 
part of its structure, graphically depicting the 
results of these calculations, and performing the 
following analyses: 

The a or p structure of the segments are inter- 
preted on the basis of height and width of peaks in the 
Uax and ttfx profiles and the predicted hydrophobicity 
of the segments. The height is determined relative to 
a threshold. The threshold may be arbitrarily set at 
0, or may be assigned a different value, depending on 
the protein being analyzed. For example, as in the case 
of the porins, (see Figure 3) peaks were considered to 
originate at a threshold of zero, and were taken to 
predict p-structure if their upper segment exceeded a 
threshold set at about 2 (1.83 in one case, 2.15 in 
another) in an scale set to range from -4.5 to +4.5 
(see Example 6, below). The value of 2 was chosen 
because it best fit proteins, known to have p- 
structure, used for calibration. 

Peaks wide enough to correspond to a segment of 
the amino acid sequence long enough to span the 
membrane as an o-helix (e.g. 18-22, preferably 20 or 21 
residues) are predicted to be a structures. Peaks that 
are too narr w to correspond t a segment of the amin 
acid sequence long enough to span the membrane as an a- 
helix but which are wide nough t correspond to a 
segment of the amino acid sequence with the c rrect 
length to span the membrane as p-strands (e.g., as 
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short as 6 r sidues, pr ferably 9-14) are predicted to 
be P structures. Transmembrane segments of a protein 
may thereby be predicted to comprise either amphiphilic 
a -type or p -type, or both. 
5 In preferred embodiments of the invention, the 

foregoing methods may be practiced employing the source 
code for the Union algorithm, as set forth in Section 
5.2.1, below. The Uax or the U0x profiles generated 
using the source code in Section 5.2.1. give a graphic 

10 visualization of the Uax or upx values, respectively, 
of the segments from one end of the protein to the 
other. This tracks the hydrophobic and hydrophilic 
regions relative to a universal midline. 

For example, and not by way of limitation, the use 

15 of methods comprising the Union algorithm to identify 
p-barrel structure in various proteins is set forth in 
sections 5 and 7, below. 
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S- 2 - 1 - g°Wce C«rt* For The Union AiamH+fr , 
1 '/• PROGRAM UNION 

| £ J. FISCHBARG, P. C2EGLE0T. P. .SfROVICN; OCT. 1992-oct. 1993 



5 ''^ « TURN POTENTIAL 

?0 In Ii2 S WTA PROM: 

" '""If POTPUT OATA TO: ^ 

12 f I ttoutS • patM ♦ ft (tout ♦ «.d»t« 

2 'P S -«"°«<C.JMifoutb«tin(l),2) 

17 > pj • round(unn.tphan(i), 2) 

18 # p6 • round(unnb.t»n{l), 2) 
» Xjc^^nCl). 2, 
21 Sttrfng 1 

2 J2.^if P ?!r?!, C> : ""WO. ANWtlOUTNio, nltnX ukni<» 



26 DEFINT |-N 

27 naa ■ 20 

28 NSEO • 1500 

29 rwtn ■ 26 

30 DIN •ynfaolsS(nM) 

31 DIN ttqh(NSEO) 

32 DIN saqn(NSEQ) 
« DIM ttqt(NSEO) 
3* DIM StqS(NSEO) 
35 DIN vfcytt(nu) 
3* DIN pturn(nM) 

37 DIN nwsitt(nM) 

38 DIN hout(NSEO) 

39 DIM hout*>tln(NSE0) 
*J 0IM aaphiouUNSEO) 

•J 'DIN houuh«ln(rts«q) 'normUtH 
« DIN houtspwnCNSEO) 'n™T ilT 

H J J JMPHI0UTAIPNAN(N$E0> 'noraalf,, 
«» DIN turnprop(NSEQ) 

** DIN tumpropn(NSEO) 'nonwlixad 

*7 DIM UNNALPNA(NSEQ) "orwuttd 

*8 DIM unntlph«n(NSEO) 'nonMliz*d 

49 DIN UNNBETA(NSEQ) n™ui«fl 

50 DIN unnbttan(NSEQ) 'nen»lfz«l 

51 DIN tun*rop.(N*EQ> 

52 DIN DUM(NSEO) 

53 DIM DUNN(NSEO) 'normmU.~4 
5* DIMpt(20, 4) ner^Uwd 

55 DIM PTSEO(NSEO) 

56 DIM ptMqn(NSEO) •non»lli«d 

58 cfT*"* " ^» IUI, 'wI!? 

59 'PRINT - ENTER PATH (WITH 
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60 'INPUT PATHS 

61 ptthS • -ClVSIMEAUFOITV 

62 0EFAULT8 • patHS ♦ ■DEFAULT.TW" 

63 OPCN DEFAULTS FOR INPUT AS #5 

64 INPUT #5, fUafi*. fHaouS, angalpha, anpbata, whel, apan, w 
xc CLOSE #S 

S fii.it** ■ i»ths ♦ mains ♦ ;.mj 

67 filaeutS • paths ♦ fUaeOS ♦ ".dat" 

66 MCytfOeellttla Mai* - 9 10 

69 • 1, 2. 3. 4. 5. 6, 7, 8. 9, 10 

S : I: 5: J: i: !: ■: i: 5: < 

79 DATA -2.9 .-0.5 . 4.5 . 1.7 . M .-0.3 .-2.3 , 4 5 .-0 5 .-4.5 

80 ♦ 11 , « , « . U . 15 . W . W . IS . w , iw 

SI data .3.5 : o.o ;-s.! ;-s.4 ; 4.2 : 3.4 ;.o.s ;.o.s : 1.0 ;.4.3 

83 
64 

BS start: 

86 CLS 

88 PftlNT "l! PILE NAME POt INPUT *«i««"P» 

m MiuT »9 file NAME POP OUTPUT •; fllaoUtS 

£ £ -3* ANCLE POt ALPHA/SETA STRUCTURES CHYMOPHOtiC MOMENT) * 

R S It - SK angalpha; "SETA STRUCTURE- -f «*»» 

92 MINT -4. WINDOW SIZE PO« NENSIANE HELICES •; WHEL 

s sis 3: suure.' uss» ««« •» « 

S SIS ASSISE ENTER C0mS9~.N0 NUNSE. - 

" MINT • IP T0UARE PJADT TO CONTINUE PtESS ENTER ■ 

98 INPUT DUMMY 

99 SELECT CASE DUMMY 

100 CASE 1 

101 COSUI FILENAME INPUT 
10? ' CASE 2 

iSI » COSUt F1LENAMEOUTPUT 

104 CASE 3 

105 COSUt NOMENTANCLE 

106 CASE 4 

107 COSU8 ALPNAWINDOU 

108 CASE 5 

109 OOSUB EETAWINOOW 

110 CASE 6 

111 COSUI UNNWINDOW 

112 CASE 9 

113 GOTO aalida 

114 CASE 0 ^ 

115 OPEN DEFAULTS FOR OUTPUT AS M 

116 WHITE *6, filainS. filaou*. anaalpha, angbata, WHEL, span, UW 

117 CLOSE #6 

118 GOSUS WORKING 

119 END SELECT 
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120 MINT 3 

121 GOTO Start 
122 

123 FILENAME INPUT: 

124 SHELL "OH " ♦ pathS ♦ "*.SEO/W" 

125 LOCATE 20, 1 

126 MINTINTH PNANE <0NLT> FOR INPUT SEO; PROG ADOS DEF PILE 

127 INPUT fUtinS 

128 ftlaouS • filainS 

129 filtinpS « paths ♦ fUtfnS ♦ ".aaq- 

130 flltoutS ■ paths ♦ fitaouS ♦ «.dat" 

131 IE TURN 
132 

133 

134 WORKING: 

135 OPEN f 1 tatnpS PON INPUT AS #1 
134 INPUT #1, aaquancaS 

137 CLOSE «1 

138 turninputS « paths ♦ "Inp.dat" 

139 OPEN turninputS FOR INPUT AS 13 

140 POR I ■ 1 TO 20 

141 POR ■ « 1 TO 4 

142 INPUT #3, ptCI. ft) 

143 NEXT ft 

144 NEXT I 

145 CLOSE 13 

146 POR n » 1 TO 20 

147 READ vkyta(n) 

148 ayafcolaS(n) - HIDS(aacoda*S, n, 1) 

149 NEXT n 

150 FOR n ■ 1 TO 20 

151 READ pturn(n) 'acquira Chou-Fataan turn potantiala 

152 NEXT n 

153 RESTORE 

154 PRINT « WORKING « 

155 nlan • LENftaquanca*) 

156 POR n » 1 TO nlan 

157 taqS(n) • M OS (aaquancaS, n, 1) 

158 NEXT n 

■159 « and of uMla loop 

1*0 FOR I • 1 TO nlan ' fraa 1 to Itngth of aaquanea •/ 

161 FOR k > 1 TO 20 

iS S5ci> • Xrltl "* Wi9n valu. t. raaldua*/ 

J2 E»T ,} " k ' iM,Bn p - ,dut 

167 NEXT k 

168 NEXT I 

169 'els 

171 Hm ' M * <l> « ' •~*U> 

172 for w w crtjn^l, Mi barfnn. of 

g . « « TO Cl^ m _ Mjap., thra^ « te u,^ , 
177 

1?! » ! ! llfWW ' *flna eantar of minkm •/ 

179 ' tumprofXa) - tumaee/J 'ealeulata avarafa turn prepanalty 



• pt(M** n ♦ 2>. 4) 
MKT ft 

PTSE0<1> • PTSE0<2> 
PTSEQCnlen) - PTSEQ<nlen • 2) 
PTSEO<nlan • D • W«"l«J» M 
CAIL NORNAUPTSEQO. nlan, pteaqnOJ 

» HTOROPHORICITT CALCULATION FOR MEMBRANE HELICES • / 
FLAG ■ 1 ' calculate hydrotobielty* / 

t a UMEL ' "indOW 

Lena main ' and we will ott hout<e» 

S?«SU!L(hout()rnl«. houtah.ln<» 'and *. -HI 9*t houtahalnV 

. NYOROPHORICITV CALCULATION FOR SNORT SPAN 
FUO ■ 1 * ealeulata hydrofebleity" / 
I , toan ' window 

LcTuih ' and wt will !•* hout(«> 

^NC^LlhoutoTnlS, LtapwO) 'and we will (ft houtap**"/ 



'calculation alpha 
FLAG ■ 0 
i • «pan - 

2Si L!a** 'and wa will «at eaphioutC)"/ 

S?NSiuL(a-phlout(). nlan, AMPNIOUTALPHAN(>>'and wa will Rtt 
AMPH I OUTALPHAH*/ 



' calculation beta a— ant*/ 
FLAG ■ 0 'givee MfJhiout output 
j ■ apan 
ANGLE * angbeto 

GOSUI MAIN 'wa gat bata nomnt. aaphiout(e))*/ 
CALL NORMAL (aaphioutO, nlan, AMPHIOUTIETANO) 'an 

'calculate union alpha 

CALL UN(pteeqn(), houtapann( ) , AMPHIOUTALPHAHO, n 
CALL NORMALIUNNALPHAO, nlan, umalphanO) 
'ealeulata union bata 

CALL UN<pt»aqn(), houtapannO, AMPHIOUTIETANO, n 
CALL NORMAL (UNNIETAC ) , nlan, umbetenO) 

GOSUI produeto 
PRINT 5 
RETURN 

'GOTO SALIOA 
MOMENT ANCLE: 

PRINT "ENTER ANGLE FOR ALPHA STRUCTURES " 
INPUT angalpha 

PRINT "ENTER ANGLE FOR RETA STRUCTURES ■ 
INPUT angbeta 
RETURN 

ALPHAU1N0OU: 

PRINT "ENTER WINDOW SIZE FOR MENSRANE HELICAL SPANS (000 NUMIER)" 
INPUT UHEL 
RETURN 

IETAWIN0OW: 

PRINT "ENTER WINDOW SIZE FOR UNION SPAN (OOP NUMIER)" 
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2*1 INPUT ipw 

242 RETURN 
243 

2U UXNVt HDOV: 

245 PRINT "ENTER WINDOW SIZE FOR SMOOTHING UNION" 

2*4 INPUT UW 

247 RETURN 
248 

250 I f J » 1 THEN 

251 1111 STARTING SEGMENT 

252 FOR ■ . 1 TO <i • 1) / 2 

253 LI ■ 1 'lOW SOUND ART 

2L B ■ * (J " 1) / 2, «*P«r BOUNDARY 

255 GOSUS CALCULATION 

256 NEXT ■ 

257 '— — • END SEGMENT 

258 FOR ■ > (nl«n ♦ 1 • (J-D/2) TO nltn «■ etr of window*/ 

259 LI ■ • - (j . i) / 2'low lOUNOART 
280 US > nion 

261 GOSUS CALCULATION 

262 NEXT ■ 

263 ' MAIN CENTER SEGMENT 

265 P«Y. C CjVl//V° tMm ' (i ' t)/2) ' " of *• window-/ 

266 l» • • ♦ (J • 1) / 2 

267 GOSUS CALCULATION 

268 NEXT ■ 

269 ENO IF 

270 RETURN 
271 

272 CALCULATION: 

273 IF FLAG > 1 THEN « calculate hydrophobic icy*/ 

275 tmT.'tMtom' ° ' r T* hydr ? phob,c,t >' «cc«ilator •/ 

| ^VLSm * «*, l0OP0ni thr8U » ' ll - { " 

277 CUM • CUR ♦ 1 

279 Me5? U ? < " > " / ' e-ip,,tt hydrophoblcity av»r.g« • 

280 ELSE 

281 t . o ' h *"»*"*ie «tnf/ 

282 ocm . o 

283 Mx • 01 

IS "for LB TO UB* r '* tt •ecuBjlatoraV 

in window •/ loco on I through all r«. 

£ 8 

«J My • My ♦ (y • wqhd), 

291 
292 

S l »"?J ,0Ot< " ) ■«■««- 2 ♦ My - 2, 

«5 f RETURN 



♦ 1 
NEXT I 



«■ ' if 0«ax <(j.»/?«it •w— 



«M)/2*1> th#n "'"<*» «il for aaaothino 
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S , S~<J-1>/2 wrbo—ry 

. U-nltn-Cj-U/2 'lo-bouidory 

tAc * Ut* nltn 

j£ » eOSURSMTH 

SI I rot mWn TO (nltn.(j.1)/2) • > eontor of th. *ln**V 

SJ . lB-srCM>/Z 

x\ v . , COSUi wth 

J 12 ' NEXT « 

313 i rtturn 

5J SEOACCUN • 0: ACCUN - O'RESET 

$ '"oAccui " Saccun * o«c.> 

sis ACCUN • ACCUN ♦ i 

S "^UWCrt • "«CCUN / ACQ* 

321 RETURN 

38 OP^Il'toutS fOR OUTPW AS #2 

3 jg 'PRINT #2. ^Jgr'-J r- J? '-X'-lS ' ' 

325 PRINT N2, "rt«", "«1". NT", pt . urn . 

326 FOR I ■ I TO nltn 

J27 PRINT -l. IS "i !• nlt ? 

328 pi • ro»nd(hout«pmn(l>. 2) 

329 • p2.rou*(«*»loutilphon >,2> 

33, pT» r««d<ptooqn<l>, *> „„ 

332 5 • roundC«<r»lplwntl>. » 

333 p6 • rouirfCi»**tan(l). 2) 

334 p7 . ro**Chout«htln<l>, 2) 

SI 'PRINT llfoplpMlgl'IP* 

337 print I; pT; p1; P*iJ*£* WmBmn •*» •«*>", -w". 

Ss 'print J2. -r«f-.-NyEl. • ^ ' pS. p6 

339 'PRINT f2. 1. P*» -ft at 

340 PRINT f2, l» g. Pi; P** **• 

341 NEXT I '«nd do not ton 

342 CLOSE f2 

343 RETURN 

345 PRINT " ••• MNE 

3$ SUI NORMAL CDUMO, nltn. DUNNO) 

g W K - OU-CD. M*<« ' UOaJN " ^ ^ 

350 ^i??S^^-»'iu"««^;«« ,#/ 

351 If Ot^K*^!)^.*^-^ 

352 If Win > 0UN(l) THEN ooopin • «*<!> 
3» SEOCUN - SEOCUN ♦ DUN(I) 

354 NEXT I 

355 SEOAV0 > SEOCUN / nltn 'ovtroot 
MA FOR 1 ■ 1 TO nltn 

357 wmm ■ -*-5 ♦ 0 • CDWCI) • tto^ln) / (ttojox • ottpln) 

358 NEXT I 

359 END SUR 
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360 

S61 M UN <pt««qn<>, HOUTNO, ANPHIOUTNO, Mm, UMNO) 

362 

363 to* mm 1 TO nl«n 

364 UNN(a) • M0UTN(«} * AMPHIOUTN(n) • pt«t<r»<«> 

365 NEXT ■ 

366 END SUN 



5 5.3. The Union Pm^m 

In preferred embodiments of the invention, the 
Union algorithm may be employed together with certain 
features of the Chou-Fasman-Prevelige, Kyte-Doolittle 
and PHD algorithms. This combination is referred to 

10 herein as the UNION program, source code for which is 
set forth in 5.3.1., below. For purposes of formatting, 
in a few instances, where the material for one line of 
source code could not fit into the margins, it was 
slightly indented and moved to the next line. 

15 For example, and not by way of limitation, the 

following method may be performed. 

(1) Valves of H x , m Px and <pt> and the obtained 
values of U px may be calculated as set forth in the 
preceding section and their ranges scaled from -4.5 to 

20 +4.5 (see example sections 6 and 7). 

(2) The Union algorithm may be used to mark the 
approximate location of the secondary structures. The 
Uax or the Upx profiles give a graphic visualization of 
the Uax or U0x values of the segments, respectively, 

25 relative to a universal midline. 

(3) The o or p structure of the segments may be 
interpreted from the Uax or Upx profiles so that the 
width of the peaks from either profile may be compared 
to the actual distance needed to bridge a membrane. 
The segments, and thus the protein may be assigned an a 
or p structure based on the length of peaks in the Uax 
and Upx profiles and the predicted hydrophobicity of 
the segments. 

(4) The segments may be refined using the CFP 

35 algorithm, as set forth in Prevelige and Fasman, 1989, 
in "Prediction of Protein Structure and th Principles 
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of Protein Conformati n", Fasman, d., Plenum Pr ss, 
New York, pp. 391-416, to calculate the values for a 
and P average propensities for tetrapeptides. 

(5) Data from the neural network program PHD 

5 (Rost and Sander, 1992, Nature 2£0: 540) may be added 
as separate profiles of the segments. 

(6) The various plots obtained from the methods 
described in (1) - (5) may be combined in a single 
figure for the global picture of an individual protein. 

10 This step renders the data maximally informative and is 
specified by the UNION program, the source code of 
which is set forth in Section 5.3.1, below. The UNION 
program runs in the IBM DOS or Microsoft-DOS 
environments, using a columnar input ASCII file that 

15 includes: (1) the amino-acid sequence of the protein 
and (2) a corresponding sequence of literal secondary 
structural assignment codes for that amino acid 
sequence, either from the Brookhaven database for 
proteins with known structure, or derived from 

20 predictions for proteins of unknown secondary 

structure. The literal structure codes are converted 
into numbers and a columnar output file is generated. 
Figures for data analysis may be conveiniently obtained 
by importing the UNION output into a graphics program: 

25 "ORIGIN", Microcal Software, Northampton, MA 01060. 

For example, and not by way of limitation, the use 
of the UNION program to identify p-barrel structure in 
various proteins is set forth in section 7, below. 
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5.3.1. Source Code For The Union Prom-^ 

'/* PROGRAM UNION (for Union, and Chou-Fasman-Prevelige) 

V* J. FISCHBARG, F. CZEGLEDY, P. ISEROVICH, J. LI; COPYRIGHT 1994 */ 

7* COLUMBIA UNIVERSITY, NEW YORK 

'/* TO CALCULATE AVG HYDROPHOBICITY, AMPHIPHILICITY AND TURN 
POTENTIAL 

7* OUTPUT COLUMNS FOR SYMPHONY OR ORIGIN 

V* please do not use word "UNION" in program (PB3 has command UNION) 

TAKES INPUT FROM DEFAULT TEXT FILE, NAMELY: 
' DEFAULTS - pathS + "union.INI" 

TAKES SEQUENCE DATA FROM: 

fileinpS « pathS + fileinS + ".sqt" 
' TAKES STRUCTURE INFORMATION FROM SAME SQT INPUT FILE: 
' either from crystallog. or from preds., e.g., from PHD robot prediction 

WRITES OUTPUT DATA TO: 

fileoutS - pathS + "\"+ fileouS + ".dat" 
1 Columnar output generated is: 
'1) res: residue number 
7) aa: amino acid code 

'3) H21: Kyte-Doolittle hydrophobicity, span selected for large windows 
'(usually 21), assigned to center residue 

*4) H7: Kyte-Doolittle hydrophobicity, span selected for small windows 
'(usually 7), assigned to center residue 

'5) ua: Union for alpha structures, small window span (same as in H7) 
'6) ub: Union for beta, etc. 

7) Pa: Chou-Fasman avg. alpha propensity for tetrapeptide (i,t+l,H-2,i+3) 
'8) am: marker for supratheshold CF alpha (4.5 value for ease of plotting) 
'9) Pb: CF avg. beta propens. f tetrap. 
'10) bm: marker for suprath. CF beta (4.5) 

'1 1) pt: Chou-Fasman position-dependent tetrapeptide turn propensity 
'(ass. to second residue) 

'12) tm: marker for suprathreshold CF turn propensity (4.5) 
'13) prda: alpha prediction marker (3.5 value) 
M4) prdb: beta pred. marker (3.5) 
'15) prdt: turn pred. marker (3.5). 

The last three lines merely represent the conversion of the information in 
•the second line of the input file. 

' IIIIIIIIIIIIIIIIIIIIIIIIHII/llllinillllllllllllllllllllllllU^ 

Sstring 2 

Sstatic 

els 

•DECLARE SUB NORMAL (DUM»0, nlen%, DUMNIO) 

DECLARE SUB UN (ptseqJO, HOUTI0. AMPHIOUTI0. nlen%, UNNIO) 

'els 
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commonp8th$ f filcm$,fil«np$,nscq,houtsh(l),ptseq(l^ 
ubett(l),hout(l) 

pSn^ MAXIMUM SEQUENCE LENGTH TO DIMENSION ALL ARRAYS 
BY" 

print -(preferably <2000; if more, might be limited by memory)" 
PRINT " Default - carriage return - 1999" 
input nseq 

if nseq=0 then nseq=1999 

naa - 20 "unless dealing with extraterrestrials... 

•NSEQ - 300 'maximum number of amino acids in sequence; sets array sizes 

program sensitive to this in the Power Basic environment 
• however, if compiled, so for no limit encountered for the executable 
DIM symbols$(naa) 
DIM seqh(NSEQ) 
DIM seqn(NSEQ) 
DIM seq$(NSEQ) 

DIM vkyte(naa) 
DIM ptum(naa) 
DIM hout(NSEQ) 
DIM houtsh(NSEQ) 
DIM amphi(nseq) 
DIM amphiout(NSEQ) 
DIM amphibeta(NSEQ) 
DIM amphialpha(NSEQ) 
DIM UALPHA(NSEQ) 
DIM U(nseq) 
DIM ubeta(NSEQ) 
DIM DUM(NSEQ) 
DIM DUMN(NSEQ) 

DIMpai(naa) 'alpha propens. for individual amino acids 
DIM pas(nseq) 'sequential indiv. alpha propens. along chain 
DIM patetr(nseq) 

DIM pam$(nseq) -j 

DIMpbi(naa) "beta propens. for individual amino acids 

DIM pbs(nseq) 'sequential indiv. beta propens. along chain 

DIM pbtetr(nseq) 

DIM pbmS(nseq) 

DIM pt(naa, 4) 

DIM PTSEQ(NSEQ) 

DIM ptm$(nseq) 

DIM phdaS(nseq) 

DIM phdbS(nseq) 

DIM phdtS(nseq) 

DIM temp$(nseq) 

DIM aposS(nseq) 
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DIM aneg$(nseq) 
DIM aroS(nseq) 

comienzo: 

aacodesS = "ARNDCQEGHHJCMFPSTWYV" 

alphacut «= 100 

betacut = 100 

turnout ■ 0.75e-4 

CLS 

driveS = *c:" 
paths = "\union" 

DEFAULTS = driveS + pathS + T + "union INI" 
OPEN DEFAULTS FOR INPUT AS #5 

fileinpS - drives + pathS + "\" + filenames + ".sqt" 
fileoutS « drives + pathS + "\" + filenames + " dat" 

TCyte-Doolittle scale 

I. 2, 3, 4, 5, 6, 7, 8. 9, 10 
A, R, N, D, C, Q, E, G, H, I 

• ^ 1 1 !^; 5 '- 3 V3.5 > 2.5 > .3.5,.3.5^.4,.3.2,4.5 

II. 12,13,14,15,16,17,18,19,20 
L, X, M, F, P, S, T, W, Y, V 

DATA 3.8,-3.9,1.9,2.8,-1.6,-0.8,-0.7,-0.9-1 3 42 

•CHOU-FASMAN64.protein database ••♦**•*•♦•****•♦...♦•♦ 

is^s^ss^ 

DATA R, 100, 94,0.070,0.106,0.099,0.085 

DATAN, 78, 66,0.161,0.083,0.191,0.091 

DATA D, 1 06, 66,0. 147,0. 1 1 0,0. 1 79,0 08 1 

DATA C, 95,107,0.149,0.053,0.1 17,0.128 

DATA Q,l 12,100,0.074,0.098,0.037,0 098 
DATA E, 144, 5 1,0.056,0.060,0.077,0.064 
DATA G, 64, 87,0.102,0.085,0.190,0.152 
DATA H, 1 12, 83,0. 140,0.047,0.093,0.054 
DATA I, 99,157,0.043,0.034,0.013,0 056 
DATA L,130,l 17,0.061,0.025,0.036,0 070 
DATA K,121, 73,0.055,0.1 15,0.072,0.095 
DATA M.132,101,0.068,0.082,0.014,0 055 
DATA F,l 1 1,123,0.059,0.041,0.065,0 065 
DATA P, 55, 62,0.102,0.301,0.034,0.068 
DATA S, 72, HO. 120,0. 139,0. 125,0. 106 
DATA T, 78,133,0.086,0.108,0.065,0.079 
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DATAW,103,124,0.077,0.013,0.064,0.167 
DATA Y, 73,13 1,0.082,0.065,0. 1 14,0. 125 
DATA V, 97.164,0.062,0.048,0.028,0.053 

nmminnmniimnimmniiiiimmniiiiiimnm 

CLS 

'print "Free memory: ";fre(0); fre(-l); fre(-2) 

PRINT "UNION ALGORITHM; J. Fischbarg, F. Czegledy, P. Iserovich, J. Li. 

print" Set for sequence lengths up to " nseq 

print" 

START: 

DUMMY = 0 

PRINT " ENTER ONE OF THE FOLLOWING " 
PRINT m . 

PRINT "1 . CHANGE FILE NAME FOR INPUT; currently: "; filonpS 
-PRINT "2 CHANGE FILE NAME FOR OUTPUT; currently: "; tileoutS 
PRINT "2. CHANGE ANGLE FOR ALPHA/BETA STRUCTURES 

(HYDROPHOBIC MOMENT) " 
PRINT " ALPHA STRUCTURE^ "; angalpna; "BETA STRUCTURE^ "; angbeta 
PRINT "3. CHANGE A.A. SPAN FOR MEMBRANE HELICES; currently: "; WHEL 
PRINT "4. CHANGE AA. SPAN FOR UNION; currently: "; span 
PRINT "5. CHANGE PATH; CURRENTLY: " pathS 
print "6. CHANGE DRIVE; CURRENTLY: " drive! 

•PRINT "6. CHANGE WINDOW SIZE FOR SMOOTHING UNION; currently: "; UW 
PRINT "9 TO END SESSION WITHOUT RUNNING" 

PRINT "0. (DEF=CR) MAIN - RUN WITH CURRENT PARAMETERS- RUNS ONLY 

ONCE AND EXITS 
print 

PRINT"" 
INPUT DUMMY 
SELECT CASE DUMMY 
CASE1 GOSUB FILENAMEINPUT 

• CASE 2 : GOSUB FELENAMEOUTPUT 
CASE 2 GOSUB MOMENT ANGLE 
CASE 3 . GOSUB ALPHAWINDOW 
CASE 4 : GOSUB BETAWINDOW 
CASES GOSUB NEWPATH 

TO COMIENZO 'road under repairs- monkeying with discouraged 
•CASE 6 : GOSUB UNNWINDOW 

CASE 9 : GOTO salida 

CASE0 : GOTO correte 
END SELECT 
GOTO start 

• imiumiimumiimiimmimiitininnmmimiiii 
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producto: 

OPEN fiJeoutS FOR OUTPUT AS #2 

f*? 1 . S 05 " M " 05 " H21 " <* "H 7 " <* "ua" c$ W c$ -P fl - ct 

FORI=lTOnlen 
locate 16,1 

PRINT "1= "; |; nlen- "; nlen 
Wng = round(hout(l), 2) 
hsh - round(houtsh(l), 2) 
ua«round(uaJpha(l),2) 
ub»round(ubeta(l),2) 
pa - rouiKKpatetrjl), 2) 
pb*=round(pbtetr(I),2) 
pt - round(ptseqfl), 2) 
PRINT #2,1 c$ seqSG) c$ hlng c$ hsh c$ ua cS ub c$ na cS Mm tm 

NEXT I 

CLOSE #2 ^d do next row 

newpath: 

P ",«r C ™°™«Xno,e » ^ x „ ^ ^ 

if te$t$="" then goto newpath 
path$ * test* 

+ path$ + v + "union.INI- 
fi einp$ = dnveS + paths + T + filenames + - sot" 
fileout J = dnveS + pathS + V + fi JenaineJ + - J. 
return 

wunimmiimiimummmimmmmm 

correte: 'main routine - records parameters and runs 

OPEN DEFAULTS FOR OUTPUT AS #6 
^*MriveS.p^^ 

GOSUB WORKING 
PRINT 

SroVw^i^f 3 RUN ^"SFULLY - STOPPING NOW 
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•GOTO start 

FILENAMEINPUT: 
els 

chdrive driveS 
chdir pathS 
files "*.SQT" 
files "*.DAT" 
LOCATE 20, 1 

PRINT "ENTER FNAME (ONLY) FOR INPUT SEQ; PROG ADDS DEF FILE 

TYPES .SOT & DAT" 
INPUT filenames 

fileinpS = driveS + pathS + "V + filenames + ".sqt" 
fileoutS = driveS + pathS + "V + filenames + ".dat" 
RETURN 

MOMENT ANGLE: 

PRINT "ENTER ANGLE FOR ALPHA STRUCTURES " 
INPUT angalpha 

PRINT "ENTER ANGLE FOR BETA STRUCTURES " 
INPUT angbeta 
RETURN 

ALPHAWINDOW: 

PRINT "ENTER WINDOW SIZE FOR MEMBRANE HELICAL SPANS (ODD 

NUMBER)" 
INPUT WHEL 
RETURN 

BET A WINDOW: 

PRINT "ENTER WINDOW SIZE FOR UNION SPAN (ODD NUMBER)" 
INPUT span 
RETURN 

UNNWINDOW: 

PRINT "ENTER WINDOW SIZE FOR SMOOTHING UNION" 
INPUT UW 
RETURN 

' iiinimmiiiuimmwiimtHHiiiiiiiiiitiiiiiiiiiiiiiii 

WORKING: 

print fre(0); fre(-l); fre(-2) 
OPEN fileinpS FOR INPUT AS #1 
INPUT #1, sequences 
input #1, structures 
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CLOSE #1 

FORn»ITO20 
READvkyte(n) 

symbols$(n) = MID$(aacodes*, n, 1) 
NEXTn 

FORi=lTO20 
RESTORE 

print" i;:""T mmmmm 

fr~L.m (xqutnctS) WTcforChou-Fasman-Preve.ige.ct.pcp.idcs 
FOR n = 1 TO nJen 

seq$(n) = MED$(scquenceJ, n, 1) list of aa codes 
NEXTn 

FOR I * 1 TO nlen ' from 1 to length of sequence */ 
FORk=l TO 20 

IF seqSfl) - symbols$(k) THEN ' identify ordinal FOR aa */ 

seqhO) = vkyte(k) ' assign hydrophobicity value to residue*/ 
„ s "!" ( J>* sk '^sign residue name number*/ 

pas(i = pai(k) 'assign alpha propensity • 

pbs(i) - pbi(k) 'assign beta propensity 

^IfS* 'done here; leave for/next loop 

END IF r 

NEXTk 

NEXT I 



FOR n = 2 TO (nlen -2) 
erase seqn 

PTSEQ(1) = PTSEQ(2) 
PTSEQ(nlen) = PTSEQ(nlen - 2) 
PTSEQ(nlen - 1) « PTSEQ(nlen - 2) 
for i=l to nlen 
if ptseq(i) >= turncut then 
for ind = 0 to 3 

ptm$(i + ind) = "4.5 " : next ind : goto cortada 

end if 

if ptseq(i) < tumcut then 

if ptm$(i) = -4.5 " then goto cortada 
else 

ptm$(i) = " " 

end if 
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cortada: 
next i 

CALL NORMAL(PTSEQ0, nlen, ptseqO) 



////////////////// 



• HYDROPHOBICITY CALCULATION FOR MEMBRANE HELICES * / 
FLAG = 1 ' calculate hydrophobicity* / 
j = WHEL ' window 

GOSUB MAIN ' and we will get hout(m) 
CALL NORMAL(houtO, nlen, houtQ) 'and we will get hout long*/ 



************ ////////////////// 



m***************< 



' HYDROPHOBICITY CALCULATION FOR SHORT SPAN 
FLAG - 2 ' calculate hydrophobicity* / 
j = span ' window 

GOSUB MAIN ' and we will get houtsh(ra) 
CALL NORMAL(houtshO, nlen, houtshO) 'and we will get hout short*/ 
■ ************ ////////////////// **************** 

CALCULATION OF TETRAPEPTIDE PROPENSITIES 
j = cfspan 
for i=l to nlen-3 

patetr(i) = ( pas(i) + pas(i+l) + pas(i+2) + pas(i+3) )/cfspan 
pbtetr(i) - ( pbs(i) + pbs(i+l) + pbs(i+2) + pbs(i+3) ycfspan 
if patetr(i) >= alphacut then 
pam$(i) = "4.5 " 

else 

pam$(i) = " " 

end if 

if pbtetr(i) >= betacut then 
pbm$(i) = "4.5" 

else 

pbm$(i) = " " 

end if 
next i 

erase pas, pbs 

for j= 2 to 0 step - 1 'approximate bottom ends 

patetr(nlen-j) = patetr(nlen-3) 
pbtetr(nlen-j) = pbtetr(nlen-3) 
nextj 

CALL NORMALPA(patetrO, nlen, patetrO ) 
CALL NORMALPB(pbtetrQ, nlen, pbtetrO ) 
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* calculation alpha moment*/ 

FLAG = 0 'selects amphiout output 
j ■ span 

ANGLE = angalpha 

GOSUB MAIN getsamphiout(m)*/ 

CALL NORMAUamphioutO, nlen, amphialphaO)'gets amphialpha*/ 



♦•****♦••*♦* ////////////////// •****•*•*•*•»,„ 
' calculation beta moment*/ 

FLAG = 0'selects amphiout output j 
j ■ span 

ANGLE ■ angbeta 

GOSUB MAIN 'gets amphiout(m)*/ 

'getsamphibeta 
************ ////////////////// ♦*•*****•*•••*»* 

'calculate union alpha 

ert^nSl?' ^ <« 

CALL NORMAL(ualphaO, nlen, ualpha()> 
'calculate union beta 

CALL UN(ptseqa houtshO, amphibetaO, nlen, ubetaO) 
erase amphibeta K " 

CALL NORMAL(ubetaO, nlen, ubetaO) 
erase amphi 
erase seqh 

***** ////////////////// ****•**••••**•„ 

posmS = "2.5 " . negm$ = "2.0 " : arom$ ="55" 
FORn=lTOnIen 

temp$(n) = ^(structures, n, 1) list of structure codes 

fori=l to nlen 

if temp$(i) = "H" then 

phda$(i) = alfamS : phdb$(i) * «• - : p hdt$fi) = - - 

end if 

iftemp$(i) = "E"then 

endif Phda5(i) = " " : P hdb$ 0') = betam$:phdt$(i) = " " 
iftemp$(i) = "C" then 
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phda$(i) = " " : phdb$(i) = " " :phdt$(i) = " " 

end if 

iftemp$0)="T" then 

phda$(i) - " - : phdb$0) - " " : phdt$(i) - turnmS 

end if 

if seq$(i) - "F" or seq$(i)="Y" or seq$(iKW" then 
aro$(i) aromS 

else 

aro$(i) = H " 
end if 

if seq$(i) = "E" or seq$(i)= "D" then 
aneg$(i) = negmS 

else 

aneg$(i) = " " 
end if 

if seq$(i) = "R" or seq$(i)= "K" then 
apos$(i) = posmS 

else 

apos$(i) = " " 
end if 

nexti 
closed 
erase tempS 
GOsub producto 
return 

'iiiiiuiiiiiiiiuiiiiiiiiumimmiiiimiiimiiiiiiiuim 

MAIN: 'window size j is already defined 
IF j > 1 THEN 

■*•****•♦**••••• STARTING SEGMENT 
FOR m=lTO(j-l)/2 '1 to 10 
LB = 1 *LOW BOUNDARY 
UB = m + 0- 1)/ 2'upper BOUNDARY m+10 
GOSUB CALCULATION 
NEXT m 

' *MAIN CENTER SEGMENT 'lltonlenOO 

FOR m - 0 + 1 ) / 2 TO (nlen - 0 - 1 ) / 2)' m center of the w 
LB = m-0- l)/2 'm-10 
UB = m + (j- l)/2 'm+10 
GOSUB CALCULATION 
NEXT m 

' END SEGMENT 'nlen-9 to nlen 

FOR m ■ (1 + nlen - (j - 1) / 2) TO nlen 'm ctr of window / 
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LB -m-G-1)/ 2 low BOUNDARY m-10 
UB = nlen 

GOSUB CALCULATION 
NEXTm 
END IF 
RETURN 

'iiwmiiiMuiimiiiinwiHmuiiiiMiiiMitwiiwHi 

CALCULATION. 

IF FLAG - 1 THEN 1 calculate hydrophobicity of std. tm. segmts. •/ 
cumh - 0: cum » 0 ' reset hydrophobicity accumulators */ 

loop on i through all res. in window */ 



' compute hydrophobicity average * 



ELSEIF FLAG - 2 THEN • calculate hydrophobicity of short tm. segmts.*/ 
cumh = 0: cum = 0 • reset hydrophobicity accumulators */ 

FOR I = LB TO UB loop on i through all res. in window */ 
cumh = seqh(I) + cumh 
cum = aim + 1 

NEXTJ™^ = CUmh ' 01101 ' COmpute Mrophobicity average • 

ELSEIF FLAG = 0 THEN • calc. hydrophobic moment*/ 

t - 0 : a cum - 0 : Mx - 0! : My - 0! 'reset amphi accumulators*/ 
FOR I = LB TO UB 'loop on i through all res. in window */ 
x = COS(2 * 3. 1416 • ANGLE • (I - LB) / 360) Eisenberg 
y ■ SIN(2 * 3. 1416 * ANGLE * (I - LB) / 360) 
Mx - Mx + (x * seqh(I)) 
My = My + (y*seqh(I)) 
acum = acum + 1 
NEXT I 

amphiout(m) « SQR(Mx A 2 + My * 2) 
END IF 



FOR I = LB TO UB 
cumh = seqh(I) + cumh 
cum = cum + 1 
hout(m) = cumh / cum 
NEXT I 



RETURN 
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////////////// * ♦* * * * *///////////////// 

salida: 

print ■ mm done mm- 

stop 
end 



SUB NORMAL (DUMO, nlen, DUMNO) 

ytop#=DUM(l):ybot#=DUM(l):yCUM#=DUM(l)' reset max, min & avg accumulators*/ 
FOR 1 = 2 TO nlen MAX AND MTN DETERMINATION '♦/ 
IFDUM(I)>ytop#THEN 
ytop# = DUM(I) 
yhord = i 
end if 

IF DUM(I) < ybot# THEN 
ybot# = DUM(I) 
ylord=i 
end if 

yCUM# - yCUM# + DUM(I) 
NEXT I 

yAVeraG = yCUM# / nlen 'average 
FORI* 1 TO nlen 
'print DUM(I) 

'print (DUM(I) - ybot#);(ytop# - ybot#);(DUM(I) - ybotf)/ (ytop# - ybot#) 
DUMN(I) - -4.5 + 9 * (DUM(I) - ybot#) / (ytop# - ybot#) 
NEXT I 

END SUB 

■ iiiiiiiiiiiiiiiiiiiiiiiiuiiuiiiiiuiiiiiiiiim 

SUB UN (ptseqO, HOUTshO, AMPHIO. nlen, UO) 

FOR m = 1 TO nlen 

U(m) = HOUTsh(m) + AMPHI(m) - ptseq(m) 
NEXT m 
END SUB 

SUB NORMALPA (DUMQ, nlen, DUMNQ) 
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deltalfa* - 75 
Haifa* = 64 
FOR I = ] TO NLEN 

D^(D -4.S + 9* (DUM(I) - ^/(ddtalfa*) 



END SUB 



SUB NORMALPB (DUMQ, nlen, DUMNO) 

deltabcta#= 106 
Ilbeta#-51 
FOR I = l TO NLEN 

WD - -4.5 + 9 • (DUMfl) - llbcta*)/(deleabct^) 
END SUB 
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5.4. The Utilit y nf The Invention 
The present invention provides for a method of 
predicting the structure of membrane proteins, which 
may be used in the following non-limiting embodiments. 
5 In preferred embodiments, the method of the 

invention may be used to identify p -barrel structures 
in membrane proteins. The identification of p-barrel 
structure may be consistent with the function of the 
membrane protein as a translocator . As such, the 
10 present invention may be used to discern the function 
of membrane proteins, the function of which has been 
•hitherto unknown. 

Further, the identification of p-barrel structure 
in a protein may lead to the identification of 
15 molecules that can be transported by the protein. For 
example, the identification of a structure similar to 
members of the GLUT family of proteins in a particular 
protein would suggest that the protein may be able to 
translocate compounds similar to hexose compounds 
20 through a cell membrane containing that protein. Such 
an analysis may aid in the rational design of 
pharmaceutical agents that could be used to access a 
cell expressing the protein in its membrane. 

In further embodiments, the present invention may 
25 be used to design or identify compounds able to be 
transported by animal or plant aquaporins (Chrispels 
and Agre, TIBS, 1994, : 421-425). In the case of 
animal aquaporins, the channel forming integral protein 
(CHIP), abundant in certain plasma membranes, and other 
30 homo logs suggest that some of these proteins may be 

involved in clinical syndromes. Plant aquaporins like 
Tonoplast intrinsic protein (y-TIP) can be used to 
study the role of these molecules in the water economy 
of plants, as well as to cr ate transgenic plants that 
35 express these proteins from tissue specific promoters. 
Drought-resistance and hardiness in crop plants may be 



I 
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correlated with the presence and activity of these 
pr teins. The present invention can be used to 
address the current problems present in analyzing and 
manipulating the molecular structure and function of 
5 this family of membrane proteins. 

In still further embodiments, the present 
invention may be used to engineer proteins having 
useful p-barrel structures. For example, the ability 
of a number of aquaporin proteins may be compared, and 

10 the particular protein having the most favorable 

transport capability may be identified. The method of 
the present invention may then be used to analyze its 
structure, and the secondary structures of other 
membrane proteins may be manipulated to resemble the 

15 structural characteristics of the designated aquaporin. 

6. EXAMPLE: EVIDENCE THAT FACILITATE VE GLUCOSE 
TRANSPORTERS MAY FOT.D AS B-BARRFT.S 

20 6.1. MATERIALS AND METHOns 

Antibody studies. We raised three polyclonal 
antibodies ("Abs") in rabbits and used the IgG 
fractions. They were Ab-1, against the last 21 c- 
terminal amino acids of the GLUT1 protein; Ab-4 against 
25 the last 25 C-terminal amino acids of the GLUT4 protein 
(Ab-l specifically reacted with GLUT1 but not with 
GLUT2 or GLUT4, and Ab-4 reacted with GLUT 4 but not 
with GLUTl or GLUT2 as assessed by immunoprecipitation 
and immunoblotting; and Ab-c raised against a synthetic 
30 peptide containing the sequence Ile-386-Ala-405 in 
GLUTl, a sequence that is highly conserved in all 
members of the GLUT family. Ab-c reacted with the 
GLUTl, GLUT2 and GLUT4 isoforms of mammalian 
facilitative transporters as assessed by 
35 immunoprecipitation and immunoblotting and the 

r activity was specifically blocked by competition with 
an exc ss of the peptide used to generate the Ab but 
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not by an unrelated p ptid . For the experim nts all 
Abs were suspended at a final concentration of 100 »q 
of IgG per ml in modified (Vera et al., 1990, Mol. Cell 
Biol., 10:743-751) Barth's solution (MBS). 
5 Xenopus laevis oocytes were isolated as described 

(Vera et al., 1990, Mol. Cell Biol., 10:743-751) and 
injected with 50 nl of water containing 10-20 ng of in 
vitro synthesized capped RNA (Vera, supra.) encoding 
either GLUT1, GLUT2, or GLUT4, and incubated in MBS. 

10 Three days after RNA injection, uptake of 2-deoxy- 

[1.2(n)- 3 H]D-glucose ( 3 H-D0G) was measured using a 10- 
min uptake assay (Vera, supra.) . Oocytes were placed 
into l ml of MBS containing 0.5 mM DOG and 1-5 /xCi of 
3 H-D0G per ml (10 Ci/mmol: 1 Ci=37 GBq:NEN/DuPont) . 

15 Ten pooled oocytes yielded an uptake value; values were 
consistent within a given batch of oocytes. 

Alignments: We used the BESTFIT and PILEUP 
routines of the GCG (Genetics Computer Group; Version 
7.0) program package, with gap weight =3.0 and length 

20 weight =0.1 (Needleman et al., 1970, J. Mol. Biol., 
48:443-453). We aligned the sequences of Rhodobacter 
capsulatus poria (SEQ ID N0:2; Weiss et al., 1991, FEBS 
Lett., 2J0:379-382) , Escherichia coli porin (SEQ ID 
NO:l; Sw; Ompf-Ecoli) , and GLUTl (SEQ ID NO:3; Sw:Gtrl- 

25 Human) . 

Predictions. We developed an algorithm ("Union") 
to predict protein segments with relatively high 
hydrophobic ity and propensity to form amphiphilic a or 
p structures. For a residue span length i, Union (U) 

30 is: U^Hi-fin-fPt) (Equation 1). 

Depending on the structure for which U is cal- 
culated, the subindex i stands for either o or p. Hi 
is the average hydrophobicity for a span of i residues 
using th Kyte-Doolittle scale (Kyte, et al., 1982, J. 

35 Mol. Biol., 152:105-132): jul is the hydrophobic 

moment (Eisenberg et al., 1984, Proc. Natl. Acad. Sci. 
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U.S.A., £1:140-144; span i) for either a or p 
structures: the angles between a residue and the next 
for a and p structures were 100° and 160* , 
respectively, using standard values for e-helices and 
5 the generic twist of p-sheets. H A and M %i values were 
assigned to the center residue of given odd-valued 
spans. <pt> is the position-dependent turn propensity 
(Prevelige, and Fasman, 1989, in "Prediction of Protein 
Sructure and the Principles of Protein Conformation", 
10 Fasman, ed., Plenum Press, New York, pp. 391-416; 
assigned to residue 2 in the 4-point turn). We 
calculated values of H i( /x xi , and <pt> for a given 
sequence and scaled their ranges to -4.5 to +4.5 in 
each case. After algebraic addition (Eg. l) , the^U lA 
15 values obtained were in turn rescaled to -4.5 to +4.5. 
We used union profiles to mark the approximate 
locations of secondary structures. Segments were then 
refined by using (i) the Chou-Fasman-Prevelige 
prediction method (CFP) , which reguires judgments by 
2 0 the operator, and (ii) the results from a neural 

network prediction program [PHD: profile neural network 
prediction, Heidelberg; Host and Sander 1992, Nature, 
3£0:540)], which runs unbiased, without human inter- 
vention. We found it convenient to display propensity 
55 profiles using the program PSAAM (Crofts, A.R., 1992, 
Ph.D. Dissertation (University of Illinois, Unknown). 
Three-dimensional modeling was done in the Insight and 
Discover graphical environments. (Biosym Technologies, 
San Diego) . 



30 



6 »2. Results And Diseus^n n 

Effect of Abs n the Function of Mammalian H x se 
Transport rs Expressed in X. laevis oocyt s. Th 
highly conserved s quence (Ile-386-Ala-405 in GLUT1) is 
35 predict d to be intracellular in th 12H m del 

(Mueckler et al., 1985, Science, 229:941-945), which 
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locates it b tween its putative tm regions 10 and 11. 
Given the evidence for an important functional role for 
the region between tm domains 4 and 12 in GLUT1 
(Carruthers, 1990, Physiol. Rev., Tfi: 1135-1175) , we 
reasoned that an Ab against that conserved sequence 
might elicit inhibition or activation of the 
transporter. After verifying its reactivity, we used 
X. laevis oocytes expressing different members of the 
mammalian GLUT family to study the effect of this anti- 
peptide Ab on the uptake of DOG. Incubation with Ab 
for 1 hour induced a measurable increase in the ability 
of oocytes expressing any of the three mammalian 
. transporters tested, namely GLUT1 (Fig. 1A. c) , GLUT2 
(Fig. IB. c), and GLUT 4 (Fig. lC,c) to take up DOG. 
15 The Ab, however, acted only when present in the 

extracellular medium (Fig. 1A-C, c) . No effect on 
uptake was observed when the Ab was injected into the 
oocytes 1 hr before the uptake measurements (Fig. 1A- 
C) . The effect of Ab was dose dependent (Fig. ID) and 
20 was specifically blocked by competition with excess 
peptide during the incubation period (Fig. IE). The 
effect of the Ab on DOG uptake was evident after a 
short incubation period; near-maximal levels of 
activation were reached in *30 min (Fig. IF), 
incubation for several hours induced an additional 
increase in uptake (Fig. IF) . 

To determine whether the GLUTS were expressed with 
the correct orientation in the membrane of the oocytes, 
we tested the effect of two other anti-peptide Abs we 
elicited against the C-terminal regions of GLUT1 and 
GLUT4 . It was known from previous studies that this 
region of the transporters is located intracellular^ 
(Oka et al.. 1990, Natur , 145:550-553). As xpected, 
the Abs did not aff ct the capacity of th oocytes to 
take up DOG when added extracellular ly (Fig. l A-C) but 
caus d a specific and ro asurable increas in th 
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ability of oocytes expressing GLUT1 or GLUT4 (but not 
GLUT2) , to take up DOG when injected intracellular^ 
(Fag. 1 A-C) . These observations are consistent with 
previous indicates that the C-terminal region is 
central to the function of the transporter (Oka 
supra.) . 

Since both the Ab (Ab-c) and insulin (Vera, et 
al., 1990, Mol. cell Biol., I0.:743-751) increase DOG 
uptake in oocytes, we investigated whether Ab could act 
by mimicking insulin rather than by specifically 
binding to GLUTs. The results in Fig. i G -i suggest 
instead that the Ab and insulin have different 
mechanisms of action. Incubation of the oocytes with 
insulin did not affect the k. of the transporters for 
15 DOG, increasing instead the (Fig. i H and I; Table 

1) • This is inconsistent with insulin inducing the 
translocation of transporters to the cell membrane, on 
the other hand, the Ab induced a measurable decrease in 
the ^ for DOG in oocytes expressing either GLUT1 or 
20 GLUT4 without changing the V max (Fig. i H and I; Table 
1) • The short-term effect of the Ab on uptake (Fig. 
IF) can be accounted for by an increased affinity of 
the transporters for DOG. The additional increase in 
uptake observed after long incubation periods with the 
Ab (Fig. IF) may be due to the entrapment of the 
transporters at the level of the cell membrane. 



25 
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35 GLUT 4 J";Si? 6.«I ? 
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Additional evidence f r the different mod s of 
action of the Ab and insulin came from experiments in 
which oocytes were first treated with insulin and then 
with the Ab and vice versa. Under the first condition, 

5 the Ab induced a further 2-fold increase in uptake in 
oocytes pretreated with insulin (for a total 4-fol 
increase; Fig. 1G) . Quantitatively, this result is 
consistent with the effect of the Ab on the affinity of 
the transporter for DOG. On the other hand, insulin 
10 did not affect the uptake of DOG in oocytes previously 
treated with the Ab (Fig. 10) . One explanation for 
this finding is that the binding of the Ab to the 
transporter may "anchor" it to the plasma membrane and 
disrupt the dynamic equilibrium that allows insulin to 

15 modify the ratio of transporters located 

intracellular^ versus those located at the plasma 
membrane . 

The topology induced for the Ab findings com- 
promises 12H. A possible explanation for the effect of 
20 Ab recognition of the sequence Phe-389-Ala-403 in terms 
of the 12H model is to argue that perhaps tm helices 10 
and 11 are in a highly mobile segment of the protein, 
leading to the exposure of the internal loop between 
them to the extracellular medium. There is an a- 
25 helical membrane protein, colicin, which appears to 
externalize some of its a-helioes during large scale 
conformational changes (Parker et al., 1992, J. Mol 
Biol., 211:639-657). Externalization, however, shuts 
off the colicin channel, while in the present case 
30 uptake by GLUTs is enhanced by the Ab-c, ^"»*»« 
against a colicin-type mechanism. Moreover, the Ab-c 
h ad no effect when injected intracellular^ , ^her 
evidence against the intracellular location of Phe-389 
Ala-403. The simplest explanation for our findings is 
35 that the loop comprising the s gment Phe-389-Ala-403 is 
normally located on th extracellular sid of the 
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membran , suggesting a t p 1 gy inconsist nt with the 
12H model. If GLUTs are multihelical, with tm helices 
«20 residues long, and if putative helices 11 and 12 
exist, then the converged loop could only be 
5 intracellular, being separated from the intracellular 
C-terminal loop by the hairpin of these two helices 
(see Fig. 4) . 

An alternative scheme: GLUTl and the Porins. 
Given the foregoing, we searched for an alternative 

10 secondary structure for the transporter. We considered 
the structures of those few membrane proteins that have 
been solved by crystallography so far, and we came upon 
porins. In contrast to cr-helical membrane proteins 
crystallized earlier, porin monomers form 16-stranded 

15 antiparallel pBs (Weiss et al., 1991, FEBS Lett., 

280:379-382; Cowan et al., 1992, Nature, 158:727-733). 
When we aligned (Fig. 2) the sequences of J?, capsulatus 
porin (POR; SEQ ID NO: 2), E. coli porin (OmpF; SEQ ID 
NO:l), and GLUTl (SEQ ID NO: 3) (using BESTFIT) , we 

20 found pairwise scores for identity and similarity as 
follows: POR-OmpF 20.0 and 45.7: POR-GLUT1, 19.9 and 
4 6.6: OmpF-GLUTl, 18.2 AND 42.9. Porins in general 
show little overall primary sequence similarity (Welte 
et al., 1991, Biochim. Biophys. Acta, 1080 :271-274) . 

25 In particular, although the secondary structures of POR 
and OmpF are the same, the scores for the alignment are 
modest. The alignments of GLUTl with the porins, 
however, elicit about the same scores as the alignment 
of the two porins. Hence, we set out to evaluate a 

30 possible porin-fold for GLUTl. * 

Prediction of Multiple tm p-strands in Porins. 
From exploratory work, we chose a span of 7 residues to 
examine POR, OmpF and GLUTl pr files. W found that 
the union P7 (U p7 ) peaks identified the approximate 

35 location and length of the p-strands in both porins 

(Fig. 3). The thresholds in Fig. 3 (1.83 for POR: 2.15 



WO 96/18957 



-41- 



PCT/US95/16126 



10 



for OmpF) wer selected so as not to miss any strand; 
they result in only minimal overpr edict ion. Segments 
were then refined by the CFP procedure. In comparing 
the porin structures thus predicted with those known 
from x-ray crystallography (Weiss et al. , 1991, FEES 
Lett., 2JJfi: 379-382; Cowan et al., 1992, Nature, 
358-727-733), we found success rates [Q3 (Qan, 1988, J. 
Mol. Biol., 20JL: 865-884)] of 0.70 and 0.75 for FOR and 
OmpF, respectively. The correlation coefficients 
(Mathews, 1975, Biochim. Biophys. Acta, 4£5_: 442-451) 
for our predictions were as follows - for POR: a 
0 56; P 0.70; turns. 0.28; random. 0.48; for OmpF; a 
0 25; P 0.64; turns. 0.30; random 0.44. The PHD method 
(available only for OmpF) predicted regions with 
15 secondary structure similar to ours Q3 = 0.68) . 

Prediction of Multiple tm p-Strands in GLUTl. We 
identified 16 predicted tm p-strands in GLUTl (Fiji. 4). 
All were in segments that had been allocated as tm 
helices in the 12H model (Fig. 2). Using only H 21 
20 profiles, several of the peaks seen (Fig. 2) appeared 
wide enough to be interpretable as tm a-helices with 
spans of 21 residues (Mueckler et al., 1985, Science, 
229-941-945). However, four of them (arrows in Fig. 4) 
were split by predicted turns. The resulting segments 
25 were too short to bridge the membrane as a-helices but 
had the correct length for tm p-strands. We termed 
such patterns «p-hairpin signatures." Similarly, in 
the remaining 8 segments previously predicted as 20- 
residue helices (Fig. 2) we predicted tm p-strands 
approximately 10 residues long, with the rest of the 
residues sometimes forming short helices. Our pre- 
dictions for the location and length of segments with 
secondary structure are in reasonabl agreement with 
those from the PHD program (Fig. 2) . 

Given these pr dictions, we r examined the align- 
m nt of the sequences of POR, OmpF, and GLUTl. We 
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verified that segments known t have secondary 
structure in one or both porins aligned well with 
segments for which we predicted secondary structure in 
GLUT1 (Fig, 2). Eleven of the 16 predicted p-strands 
5 in GLUT1 overlapped partially with p-strands in porins. 
The paucity of gaps in these regions with conserved 
secondary structure is noteworthy. Some of the 
remaining p-strands in the porins correspond to 
segments predicted as helices in GLUT1 and vice versa, 
10 The alignment in Fig, 2 comprised about the last 400 
residues in GLUT1; based on additional alignments, the 
N-terminal region of GLUT1 might have originated in 
partial duplication of a porin gene, in addition, 
there is a high degree of sequence conservation among 
15 members of the GLUT family, and hence a multi p-strand 
motif may be applicable to all of them. 

Three-dimensional Model oi the 0B in GLUT1. The 
predictions above suggested to us that GLUT1 might fold 
as the porins, forming a Pb. To visualize whether such 
20 an idealized construct was compatible with GLUT 

function, we built a three-dimensional model of the 
putative GLUT1 PB, with the more hydrophilic sides of 
the tm p-strands facing the barrel pore. To ensure 
that there were no bad Van der Waals contacts, limited 
25 energy minimization was performed (300 iterations, 

conjugate descent algorithm. DISCOVER program) . Fig. 5 
shows an end-view photograph of the barrel (from inside 
the cell) including p-D-glucopyranose in its lumen. 
The Van der Waals inside diameter of the barrel, while 
30 irregular, was at least 11A which is more than enough 
to allow hydrated hexoses to pass through the channel. 

Pri r vidence consistent with a PB f Id. Th 2- 
N-[4-(l-azi-2, 2, 2-trifluoroethyl) benzoyl] -1,3-bis- (D- 
mannos-4-yloxy)-2-propylamine (ATB-BMPA) binding site. 
35 Peptide 217-272 appears intracellular, since a sp cific 
Ab binds to it only when the cell membrane is 
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perm abilized (Davis et al., 1990, Biochem. J., 
2^:799-808) This segment is very hydrophilic so that 
the more hydrophobic tm segment that follows it is 
likely to begin only at or near residue 273 (in either 
5 the 12H, PB, or PHD predictions: Fig. 2). The next 
marker along the chain is residue 282, which has been 
recently placed extracellular ly, since mutation of it 
(Gin-Leu) decreases ATB-BMPA exofacial binding by 95% 
(Hashiramoto et al., 1992, J. Biol. Chem. , 267:17502- 
10 17507). Hence, segment 273-281 likely spans the 

membrane: this segment (9 residues) is too short to be 
a tm o-helix (Chin, et al., 1987, J. Biol. Chem., 
261:7101-7104) residues but has the correct length for 
a tm p-strand (strand 9, residues 271-280, Figs. 4 and 
15 6) . In the 12H model, residue 282 was placed at the 
center of tm a-helix 7, where it would be inaccessible 
to ATU-BMPA. In the pB model, residue 282 is instead 
in an extracellular connecting loop. 

The proportions of e and p structures in GLUT 
20 based on CD and FTIR spectroscopy. This issue is 

unsettled. From FTIR spectroscopic evidence, it was 
concluded that GLUTl displays distinct vibrations for 
a-helical structure while those for p-structure are 
absent (Chin, supra. ) . This was partly challenged by a 
25 later FTIR study, which also found GLUTl to be 

predominantly a-helical but in addition found evidence 
strongly suggesting the presence of some p-structure, 
with a portion of it forming antiparallel strands 
(Alvarez et al., 1987, J. Biol. Chem., 26^:3502-3509). 
30 interpretations of CD evidence also appear divided. In 
one case, CD was said to indicate the presence in GLUTl 
of soro 82% a-helices, 10% p-turns and 8% random 
structure, with no p-strands. (Chin et al., 1987, 
Proc. Natl. Acad. Sci. USA, M:4H3-4116) . However, 
35 mor recently, use of an algorithm (Perezel et al., 

1991, Protein Eng., 4:669-679) to analyze CD data led 
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to predictions (Park et al., 1992, Protein Sci., 
1:1032-1049) of p-structure in GLUT1, POR and OmpF, 
among other membrane proteins, our assignments for 
GLUTi structure are in line with the more recent FTIR 
5 and CD studes (Alvarez et al., supra. ? Park et al., 

supra t ) . 

Solvent accessibility of the GLUT backbone is bet- 
ter explained by the 0B model. Others and ourselves 
have reported evidence for the existence of a water- 
10 filled pore across GLUTs (Alvarez et al., 1987, J. 
Biol., Chem., ££2:3502-3509; Jung et al. , 1986, J. 
Biol. Chem., 111:9155-9160; Fischbarg et al., 1990, 
Proc. Natl. Acad. Sci. USA., 12:3244-3247). Such an 
open pathway would have to coexist with an apparent 
15 enzyme-type tight-fitting structure, since GLUTs 

display steric selectivity for substrates. This appar- 
ent contradiction may be resolved by noting that the 
water permeability of GLUTs (Fischbarg et al., 1993, 
Alfred Benzon Symp. , 34.:432-446) is only some 7% that 
20 of specific water channels (Preston et al., 1992, 

Science, 25£: 385-387) , as if water traverses an open 
pathway through GLUTs only during part of a cycle of 
conformational changes. Both the 12H and pB models 
imply a hydrophilic pore in GLUT. On the basis of 
25 hydrogen -deuterium exchange, however, »90% of the GLUTI 
amine protons are exchanged almost immediately (Alvarez 
et al., supra t ; Hans et al., 1992, Trends Biochem. 
Sci., 17:328-333). These exchange data can be 
explained more readily if GLUTl is a pB with a solvent- 
30 filled pore, as in that case most backbone amine 

hydrogens lining the pore and forming connecting loops 
would be accessible to solvent. 

GLUTl as a multifunctional 0B transporter. From 
•recent evidence, compounds other than sugars such as 
35 water (Fischbarg t al., 1990, Proc. Natl. Acad. Sci. 
U.S.A., 82:3244-3247; Zhang et al., 1991, J. Clin. 
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invest., 56= 1553-1558), nicotinamide S fu. et al 
199 2 Biochem. J., 228 = 669-574), end dehydroasc rb!C 
acid (Vera et .1., 1993, Hature. 3^:79-82) traverse 
£L, suggesting that gluts are Afunctional (Sofue 
et supra.) • Since a barrel framew « 

essentially fi«d, as argued for porins, the GLUT1 
connecting loops might operate as molecular gates and 
„ight also be involved in binding solutes and 
aiscriminating among them. The putative Ion, intracel- 
lular GLOT1 loop (residues 204-270) may be an example, 
si „ce glucose binding to the loop induces a 
conformational change in it (Asano et al., 1992, FEBS 
Lett., 22JL= 129-132) and antibodies against the peptide 
A sn-217-I13-272 inhibit the binding of cytochalasm B 
; to the protein) . This loop may also have a binding 
site for ATP (Lys-225-Lys-229) (Carruthers et al., 
19 89, Biochemistry, 22:8337-8346) and protein Xinase C 
phosphorylation sites (Ser-226. Ser-24U, (Dezrel et 
al 1989. int. J. Biocnem., 21=807-814), all with 
0 potential functional roles. Lastly, all three 

antibodies we tested bind to putative mobile loops and 
enhance DOG uptake. The topology we propose is 
summarized in Fig. 6. 

,, 7 Example: Further Proteins Shown 
15 ? - Sl^i Pnt,-h,rre1 Structure 

7.1. ^for-ials Methods 
We obtained from databases (Swissprot, Protein 
30 information Resource) the sequences of: 

sw: P 06009 Reaction center protexn L chain (RCL) . 
sw: P 02945 Bacteriorhodopsin precursor (BR) 
sw:p04480 Colicin A (COLA) • 

P ir3:sl6070, Rhodobact r capsulatus porin (FOR; 

35 SEQ ID N0:2). 

sw: P 02931 Escherichia coli porxn (Ompf, SEQ ID 

N0:1). 
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sw:piii66 Glucose transporter type 1 (GLUT1; SEQ 
ID NO: 3), erythrocyte/brain. 

sw:p29972 Water channel protein for red blood 
cells and kidney proximal tubule (CHIP28) . 
5 sw:p027l0 Acetylcholine receptor protein, alpha 

chain precursor 

sw:p02920 Lactose permease 

sw:pl3866 Sodium/glucose cotransporter 

sw:p08513 Potassium channel protein, larval 
10 (shaker-epsilon) 

sw:pl66l4 Calcium-transporting ATPase sarcoplasmic 

reticulum type 

sw:p20648 Potassium-transporting ATPase alpha 
chain (proton pump, gastric H + /K + -ATPase) . 
15 (accession codes are given in parenthesis) . 

Predictions. Several algorithms were used. For 
hydropathy analysis, we calculated the average 
hydrophobicity H L for a span of i residues using the 

20 Kyte-Doolittle (KD) scale (Kyte and Doolittle, 1982, J. 
Mol. Biol., 112:105-132). We used spans of 21 and 7 
residues. A span of 21 residues is appropriate because 
membrane spanning o-helices are of this or similar 
lengths. On the other hand, a shorter span can uncover 

25 trends in the hydrophobicity profile that the larger 
span might average out. We decided on 7 residues as 
the shortest span to give a representative picture of a 
local neighborhood in a chain without giving rise to 
excessive "noise". We also used the Union algorithm, 

30 described above, to predict protein segments expected 
to be transmembrane, namely, having relatively high 
hydrophobicity and propensity to form amphiphilic a or 
P structures. 

We also employed the Ch u-Fasman predicit n method 
35 as implemented in the Chou-Fasman-Prevelig (CFP) 

algorithm (Prevelige et al., "Ch u-Fasman prediction of 
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the secondary structur of pr teins: Chou-Fasman- 
Prevelige algorithm. In: G.D. Fasman (eds) . Prediction 
of protein structure and the principles of protein 
conformation", Plenum Press, New York, New York, pp. 
5 391-416 (1989)). Our figures showed a and p average 
propensities calculated for tetrapeptides and assigned 
to the first residue, following the CFP procedure. 
Where these propensities equal or surpass the CF 
threshold (100 in their units), we mark the segments (a 
10 prd and p prd lines) . We also show the CF <pt> 
propensity; where <pt> exceeds the threshold 
recommended in the CFP procedure (0.00075), 4-residue 
predicted turns are marked by lines (denoted as "t 
prd") beginning with the suprathreshold residue. Our 
5 routine simply marks all such 4-point turns, rather 
than attempting to opt between them (as in the CFP 
procedure) when they overlap. 

We also used the results obtained with the PHD 
neural network prediction program (Rost and Sander, 
!0 1992, Nature, 260:540), which runs without human 

intervention in a computer, and is therefore unbiased 

to that extent. 

We found that, as a rule, no single procedure was 
completely sufficient, and it was best to combine in 

25 one figure several different types of plots so as to 
compare them and derive a global picture for a given 
protein. To that end, we wrote a program ("UCFP") in 
the PowerBasic language (Power-BASIC Inc. , Brentwood, 
ca 94513) , compiled it, and ran the executable file 

30 under IBMDOS. The source code of UCFP is set forth in 
Section 7.1.1, below. UCFP is a predecessor of the 
UNION program, and uses as inputs two files: a) the 
amino acid sequence of a protein, and b) a file with 
literal secondary structural assignment codes for that 

35 s quence, either taken from the Brookhav n database for 
proteins with known structure or derived from 
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predict! ns f r proteins of unknown structure. Our 
program computes hydrophobicities, U and CFP a, p and 
pt propensities, converts the literal structure codes 
into numbers, and generates a columnar output file. We 
5 obtained the figures presented here by importing UCF 
output into a graphics program ("Origin", MicroCal 
Software, Northampton, MA 01060). We also found useful 
the graphic display program "PSAAM" (Crofts, AR, 
"Protein Sequence Analysis and Modeling for Windows 3 
10 [],"> University of Illinois, Urbana, IL (Ph.D.; 
Dissertation)) to verify the validity of our 
algorithms. 
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53 if nstq»0 then naeq«1999 

55 £«J 2 °«« ' un4,s * de «i»n9 »»fth ettraterreatrial. 

J ... sair ~ ""Set**. 

JJ DIN eeqh(NSEO) 

« OIN seqn(NSEO) 

•J DIN seqS(NSEQ) 

« DIN vkyte(naa) 

« DIM pturn(naa) 

J* DIM hout(NSEQ) 
Jj DIM houtsh(NSEQ) 
JJ OIM amphi(naeq) 
JJ DIM Mphiout(NSEQ) 
« DIM amphibeta(NSEQ) 
•J DIM anphialpha(NSEQ) 

70 DIN UALPNA(NSEQ) 

JJ DIN U(nstq) 

2 DIM ubeta(NSEQ) 

P DIM DUM(NSEQ) 
DIM OUMN(NSEQ) 

2 d°im ::^.r^ for ^ ividu>i «•* 

77 OIM patetr(naeq> »nd,v. alpha propens. al0 np chain 

7* DIM pamS(nseq) 

81 DIM pbtet??n?eq) i"*v. hiU orc«M. Um chctn 

M DIM pfamS(nseq) 

« DIM pttnaa, 4) 

J* DIM PTSEQ(NSEO) 

85 OIM ptmS(naeq) 

ft OIM phdaS(nseq) 

87 DIM phdbS(nstq) 

88 OIM phdtS(nseq) 

89 DIM tempS(nseq) 

90 ccntenzo: 

H * "AW0CQE6HILWFPSTVYV- 

vc alphacut ■ 100 

93 betacut ■ 100 

fj tumcut • 0.75e-4 

95 CIS 

96 drives • "e:" 

97 paths ■ "\UCFP" 

£8 DEFAULTS • drives ♦ paths ♦ »\* ♦ -ucrp tm- 

99 OPEM DEFAULTS FOR INPUT AS #5 W - IM ' 

io°? irS % dr,w, ^ th, ' f,l ^' f n^8^iph., in ^t.,y«EL^ 

102 "itfnpt • drives ♦ paths ♦ -\" ♦ filename* ♦ - ^. 

103 fileeutS - drives ♦ p. t hS ♦ "\" ♦ f SZrt T -'S^ 

105 'Kyte-Doolittle aeala 



106 
107 
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HI 

; DATA A, 139. 79,0.060.0.076.0.035,0.058 

116 DATA R.100. 94,0.070,0.106,0.099,0.085 

7 £ N 78 "0.161,0.083,0.191,0.09 

8 DATA 0 106, 66.0.147.0.110,0.179,0.081 
119 DATA C 95,107,0.149,0.053.0.117,0.128 
\\l DATA 0 112 100 0.074,0.098.0.037.0.098 

1? DATA E 144 51 .0.0S6.0.060.0.077.0.064 

22 DATA 0 64, 87.0.102.0.085.0.190.0.152 

23 DATA H.112, 85.0.140,0.047.0.093,0.054 
124 DATA I 99,157,0.043,0.034,0.013.0.056 

25 DATS 130 117 0.061,0.025,0.036.0.070 

£ DATA K.121, 73.0.05S.0.115.0.072.0.095 

127 DATA M.132.101,0.068,0.082,0.014,0.055 

28 DATA F.111.123,0.059.0.041.0.0^.0.065 

129 DATA P. 55, 62,0.102,0.301,0.034,0.068 

30 DATA S, 72, 94,0.120,0. 39,0.125,0.106 

31 DATA T. 78.133,0.086.0.108.0.065,0.079 
152 DATA U.103,124,0.077.0.013.0.064.0. 67 

133 DATA Y. 73,131.0.082,0.065.0.114.0.125 

134 DATA V. 97,1t4,0.062,0.O48,0.028,0.053 

135 
136 



» mnmmnnmiutinmnmimtmitmnmmunHi 

S 'Print "Free e-ory: -;fr.CO); fr.(-1); fr.C-2> 

PRINT "UCFP ALGORITHM; J. FUehbT.. F Cujl-T, P. I«rovich. CopyM 9 ht 1994" 
print" Set for sequence lengths up to * nseq 



139 
140 
141 

142 print » 

143 START: 

U5 ESK " ENTER ONE OF THE FOLLOWING - 

146 
147 



55!!} -1 CHANGE FILE NAME FOR INPUT; currently: «}•}*• 
Si "2. CHANGE FILE ^ ««™ " 

149 PRIHT -2. CHANGE ANGLE FOR M^*V^7*tMKjmi- "; 

150 PRIHT « ALPHA STRUCTURE- », •f "^ J* \ ai . currt ntly: "; WHEL 

! 5 5 ; SS 2: 25 tt 5 £ S^T-^tiy; ., — 

153 print "5. CHANGE PATH; CURRENTLY: « P»thS 

S! SiS 3: ^fc,? £ S!S". M irw!T; u "c:i^ «*' « « - «'« 



158 print 

159 PRINT ■ • 

160 INPUT DUMMY 
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SELECT CASE OUMNT 

J« CASE 1 : 60SUB FILENAME INPUT 

163 ' CASE 2 : SOSUB FILENAMEOUTPUT 

16* CASE 2 : COSUB MONENTANCLE 

165 CASE 3 : COSUB ALPMAWINOOU 

166 CASE 4 : COSUB BETAUINOOU 
CASE 5 : COSUB NEUPATH 

1» - aSE^'^^SS; reP * ir - *«~ 
]2 CASE 9 : GOTO Ml id* 

171 CASE 0 : COTO correte 

172 END SELECT 

173 GOTO start 
174 

I S? : ^ : » : * : SJ=5 : as 

J« OPEN DEFAULT! FOR OUTPUT AS #6 

190 Sffi S' dr,ve$ ^ th, ' fil ^^n^s^.i phi ,. n9btt ^ HHBL gpin 

191 COSUB WORKING 

192 PRINT 

12 ?[» ^"[P * U " ^CCESSFULLT . STOPPING NOW- 

195 5.11 ? aU tht •t«t«wnti to save memory „« 

J3? 'oil koy arrays erased by now ~' c>n "ot «*> again 

im 'GOTO start 

JJ '""""""""'"^^^ 

1W FILENAME INPUT: 
200 ets 

12 ~" •<-» 

205 chdrive drives 

206 ehdir paths 

207 #Ues „. >$0TII 

210 mmjriamm conusor input seo ; prog ados def 

211 «WT filenames 
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212 fileinp* ■ drive* ♦ path* ♦ "V ♦ filename* ♦ ".sqt" 

213 fileout* • drive* ♦ path* ♦ "\" ♦ filenaiee* ♦ ".det" 
2U IETURN 

215 

216 MOMENTANGIE: „ 

217 MINT "ENTER ANGLE FOR ALPHA STRUCTURES - 
216 INPUT anaalphs 

219 PRINT -ENTER ANGLE FOR BETA STRUCTURES • 

220 INPUT sngbeta 

221 RETURN 
222 

224 PRInT"StER WINDOW SI2E FOR MEMBRANE HELICAL SPANS <0OD NUMBER)" 

225 INPUT UHEL 

226 RETURN 
227 

228 BETAWINOOW: 

229 PRINT -ENTER WINOOW SI2E FOR UNION SPAN (ODD NUMBER)" 

230 INPUT span 

231 RETURN 

232 

233 UNKUI NOOW: 

234 PRINT "ENTER WINDOW SI2E FOR SMOOTHING UNION" 

235 INPUT UW 

236 RETURN 
237 
238 
239 

240 WORKING: 

s«i print trt(O); fre<-1); fre(-2) 

242 OPEN filainpS FOR INPUT AS #1 

243 INPUT #1, sequence* 

244 input #1, structure* 

245 CLOSE #1 

246 FOR n ■ 1 TO 20 

247 READ vkyte(n) 

248 syi*ols*(n) • M10*(aacodes*, n, 1) 

249 NEXT n 

250 FOR i ■ 1 TO 20 

251 READ syatals*<i).pel<i).pbl(1),pt(i,1>,ptCl,2),ptCl,3>,pt(1,4) 

252 NEXT i 

253 RESTORE 

254 '/////// **•**• — — - /////////////////////// 

255 PRINT - WORKING ■ 

256 cftpan ■ 4 'prepare for Chou-Faaaen-Prevelige tetraptptfdec 

257 nlen ■ LEN( sequence*} 

258 FOR n ■ 1 TO nlen 

259 seq*(n) ■ MID*( sequence*, n, 1) 'list of aa codes 

260 NEXT n 

261 FOR I • 1 TO nlen • fro* 1 to length of sequence •/ 

262 FOR k • 1 TO 20 

263 IF aeqt(I) • syateolsBCk) THEN ' Identify ordinal FOR aa •/ 

264 seqh(I) ■ vkyta(k) ' assign hydrophobic! ty value to residua-/ 

265 seqn(I) « k 'assign residua nam nuafcer / 

266 pM(f) - pai(k) 'assign alpha propensity 

267 pbs(i) ■ pbi(k) 'assign beta propensity 

268 as it for 'dona h re; leave f r/next loop 
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269 END IP 

270 NEXT k 

271 NEXT I 
272 

273 FOR n * 2 TO (nlan • 2) 

276 trait aaqn 

277 PTSEQ(I) • PTSEQC2) 

278 PTSEO(nlan) ■ PTSEQCnttn -.2) 

279 PTSEO(nlan - 1) » PTSEQ(nlan • 2) 
2M for 1»1 to nltn 

261 if ptsaq(l) >• tumcut than 

282 for ind • 0 to 3 

283 pta*(! ♦ fnd) • -4.5 - : naxt ind : goto eortada 

284 and if 

285 if ptstq(i) < tumcut than 

286 If ptaS(i) - -4.5 - then goto eortada 

287 tli« 

288 ptmt(f) mum 

289 and if 

290 eortada: 

291 naxt i 

292 CALL NORMAL CPTSEQC). nlan, ptaaqO) 



294 . ////////////////// 
295 

296 ' HYDROPHOBIC! TY CALCULATION FOR MEMBRANE HELICES • / 

297 FLAG « 1 * calculate hydropfcobicity- / 

298 j • WHEL i window 

299 COSUB MAIN ' and va will oat hout(a) 

300 CALL NORMAL (houtQ, nlan, houtO) 'and wt will gat hout long*/ 

302 • •••^••w. ////////////////// ■...../ 

304 ' KYOROPHOBICITY CALCULATION FOR SHORT SPAN 

305 FUG • 2 » ealeulatt hydrophobic! ty* / 

306 j ■ span « window 

307 COSUB MAIN ' and wt will gat houtahOi) 

5SS NORMAL (houtahO, nlan, houtihO) 'and wt wilt gtt hout ahortV 

309 » ////////////////// B BMK ' 

310 ' CALCULATION OF TETRAPEPTI0E PROPENSITIES 

311 j ■ cfspan 

312 for i«1 to nlan-3 

313 patatr(f) . < p*t(i) ♦ pas(M> ♦ pas(l*2) ♦ pat(i*3> >/cftpan 

314 pbtatr(i) > < pba(i) ♦ pbs(M> ♦ pba<<*2) ♦ pba(i*3) )/cfipan 

315 if patetr(i) >■ alphaeut than 

316 paaS(i) « -4.5 - 

317 alaa 

318 peaft(l) « - ■ 

319 and if 

320 if pbtatr(i) >« bataeut than 
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<i) « "4.5 • 

tlst 

pbR*(i) ■ ■ ■ 
tnd if 
next < 

f£T V\o 0*!ttp -1 'appro*i«att bottca. tnd. 
patatr(nlan-j) ■ patatr(nltn-3> 
pbtttr(nlan-j) • pbtatr(nlan-3> 
ntxt j 

CALL NORMALPA(patatr(>, nltn, patatrQ > 
CALL NORMALPB(pbtatrC), nltn, pbtatrO ) 

' calculation alpha mamnf/ 

FLAG • 0 'stltcta aarphiout output 
j ■ span 

ANCLE ■ angalpha 

COSUB MAIN 'gata a«ph<out(B))V 

CALL N(XMAL<MV*fout(), nltn, aaphialpha())'oata aaphiatphaV 



////////////////// 



• calculation btta aownt*/ 

FLAG » 0 'aalacta anphiout output 
J ■ «oon 
ANCLE ■ angbata 

COSUB MAIN 'gata aaphiout(»)V 

CALL NOWUKaflfJhioutO, nlan, oaphibttaO) 'gata anphibata 
araat anphiout 

» ••••«••••••• ////////////////// X1XXXXX.M 

'calculate union alpha 

CALL UNlptatqO, houtahO, anphialphaC), nlan, ualphaO) 
araaa anphi alpha 

CALL NORMAL (ualphaO, nlan, ualphaO) 
'calculate union btta 

CALL UH(ptaaq(), houtahO, anphibataO, nltn, ubttaO) 
araaa aaphibate 

CALL NORMAL (ubttaO, nlan, ubetaO) 
araaa anphi 

trait atqh 

. ////////////////// - U111MM 

• imtmiiminiimimniiiiimiiiitiiiiiiHiiiiiitiimm 

» PROCESS STRUCTURE STRING (PREDICTIONS OR CRTSTALLOG.) 
alfan* » "3.5 ■ 
btteu* > "3.5 - 
turnms • "3.5 " 

FOR n • 1 TO nlan m . _ 

tatpS(n) ■ M!OS(ttrueturaS, n, 1) 'Hat of atructura codta 

NEXT n 

for i»1 to nlan 
if ttnpS(i) • "M" than 

phdaS(i) • alfams ; phdbf(i) • » ■ : phdtSO) ■ ■ 
end if 

if tenpS(i) • "E* then 



M 
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380 ff twpscf) . -c- thtn 

382 md if pwwiw ■ ■ : phdtt(l) « « m 

383 If tt*ft(i) . -T- thtn 

385 tnd if • phdtB(f) ■ turrns 

386 ntxt f 

387 clMt i4 

5SS ,r-,a 

GOsii> producto 
390 return 
391 

397 , *^rr , *rr # . sTA « T,, ' G «■» 

I«a W ■ ■» ♦ CJ - 1) / 2» W r BOUNDARY »»10 

400 gosub CALCULATION "«w«T * M0 

*01 next m 

402 ' —•••-MAIN CENTER SEGMENT Ml to nltn - 10 

f05 U8 • m ♦ (j • 1) / 2 * ,+10 

406 GOSUB CALCULATION 

*07 NEXT m 

408 — • END SEGMENT 'nltn-9 to nltn 

a!o f? * " ( J * "i*" \ CJ " f > / » TO nltn etr of window/ 

JJ? H 'low BOUNDARY .-10 

411 Ul i nltn 

412 GOSUB CALCULATION 
♦13 NEXT M 

4H END IP 

*15 RETURN 
416 

HI ' """""'"""""""^^ 

419 CALCULATION: 
420 

422 111 1 ?*' 1 m \ ' Mleul>t » Mroptablclty of ttd. t». w . ., 

S H'/A J," - < - «n -in** V 

426 cum ■ curt ♦ 1 

429 

A 32 °' eu " * 0 ' •■•»« hydrophobicity tcctnulatort •/ 

s £*.\£;?, u ! eu * - « «"~* •" « <» — - / 
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436 



houtshd) . co* / cm ' «*ut. hydrophobic^ aver. 9 . • 

437 NEXT 1 

35 ELSE IF FLAG • 0 THEM ' ealc. hydrophobic «mnf / 

440 « » ■ n • Mm ■ 01 * My ■ 01 'reset aophi tcewiUtori / 

Zl Id I I Sro u! ' o£ on 1 through .11 res. in Undo* V 

H " «( * ISu * ANGLE .IT ' LB) / 360) Ei.enber, 

444 J . SINC2 « 3.H16 • ANGLE • (I - LB) / 360) 

as n* « nx ♦ (x • •tqh(D) 

446 My ■ My ♦ <* • seqh(l)) 

447 acum » acum ♦ 1 

448 NEXT ! „ » 

449 «iphiout<m) ■ S0R<Nx * 2 ♦ My 2) 

4S0 

4S1 END IF 
452 

453 RETURN 

4I5 • nniniimuiiimiiimnuiuimmmtinmitmimnn 

4S6 

457 producto: 

45B cS ■ M # " 

459 OPEN fUeoutS FOR OUTPUT AS #2 , 

S? ATS w*S V5 A IW-^— - »~ " 

462 FOR I • 1 TO nlen 

463 locate 16,1 

464 PRINT "l« M ; I; nlen* ■} men 

465 hlng • roundChout(l), 2) 

466 hsh ■ roundChoutsh(l), 2) 

467 ua ■ round<ualphaCl), 2) 

468 ub « round<ubeta(l>, 2) 

469 pa « roundCpatetr(l), 2) 

470 pb ■ round<pbtetr(l), 2) 

477 return 

& - unnnmm nmmnmnn 

480 

iiBi salida: 

482 PRINT « fit 0ONE •###" 
483 

484 stop 

485 end 
486 

488 ' ////////////■ 

JJJ SUB NORMAL <0UM(), nten, DUMNO) 

8 ' , " u " ,,,,r " , 
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494 IF OUN(I) > ytepM THEN 

495 ytOp* ■ OUN(I) 

496 yherd ■ 1 

497 ff 

498 If OUN(I) < yfootf THEN 

499 ytetf ■ OUN(I) 
$00 yiord-l 

501 and ff 

$02 ' yCUMf • yCUN# ♦ DUM(I) 

$03 NEXT I 

$04 ' yAVtraS - yONW / nltn 'avtrtflt 

505 FOR I • 1 TO nltn 

$06 'print OUN(t) 

$07 'print COUNC I > - ybot#);Cytop# • ybot*);(OUPt(I) - ybetf)/ (ytopf • ybotf) 

$oS BS" ' ** ,s * 9 * <0UM(,) * ybot#) ' <ytop# 

$10 

$11 END SUB 

$12 
$13 
5H 

515 SU8 UN CptatqO, HOUTthC), ANPHIO, nltn, UC» 

516 

517 FOR m « 1 TO nltn 

518 U(m> ■ MOUTsh(m) ♦ AMPHltm) • ptstqCn) 

519 NEXT m 
$20 END SUt 

$21 ' »»»HWtHttH»<M l t HtHt lM«WWt< 

522 SUB NORMA LPA COUNO, nltn, 0UNNO) 

524 dtttalfaf ■ 75 

525 Haifa* « 64 

526 FOR I * 1 TO NLEN 

527 OUNN(I) • -4.5 ♦ 9 • (OUM(I) • Haifa*) / Cdtltalfai) 
525 NEXT I 
529 

530 EN0 SUB 



' tttnmmmiiHitinmmiitmiim/ititttitttmiutimitmntm 



$31 ' 



HI SUB NORMA LPB COUNC), nltn, OUMNC)) 

$34 dtttabtta* * 106 

535 Ubeta* « 51 

536 FOR I m 1 TO NLEN 

537 OUMNC!) « -4.5 ♦ 9 • CDUMC!) • Ubttaf) / Cdtltabtttf) 

538 NEXT I 
539 

540 END SUB 
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7.2. Results 
Evaluation of the prediciton profiles. 
Validations: (l) aultihelical proteins. The reaction 
center (L chain) constitutes a good example of a 
5 successful prediction (Fig. 7a) . The H7 profile marks 
several hydrophobic segments which are long enough to 
span the membrane. This prediction of long segments is 
borne out by the U a7 peaks, and by the relative paucity 
of predicted turns, leaving long stretches of seguence 
10 with little turn propensity and hence with relatively 
higher propensity to form structure. An assignment to 
multi-a folding could be made at this point, after 
which the segments could be refined using a detailed 
CFP spreadsheet (Prevelige, supra.) , cap propensities, 

15 and so on. 

Bacteriorhodopsin also evidences long hydrophobic 

stretches (H 7 ) borne out by U a7 peaks, and relatively 

few predicted turn regions (Fig. 7b). The trend to 

long structured segments is curiously more discernible 

20 in the CFP-p predictions than in the a-predictions. 
Still, the protein can be classed as multi-a on the 
basis of the length of the predicted segments. 

For colicin (Fig. 7c), hydrophobicity analysis 
alone seems insufficient, since it predicts long 

25 stretches known as transmembrane as hydrophilic. Our 
way of plotting normalized (rather than absolute) H 
values exaggerates this trend, which is nonetheless 
noteworthy. In this instance, CFP a-predictions and U a7 
profiles demonstrate that the length of the predicted 

30 segments is consistent with multi-a-helical fold. 

Overall CFP a-propensity is higher than that for p (we 
plot absolute values for both) . Hence, multi-a 
assignment seems adeguate. 



35 



validati ns: (2) porins. Rhodobact r capsulatus porin 
exemplifies a trend (Fig. 7d) : some p aks that appear 
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as long hydrophobic stretches in the H 21 profile are 
split in the H 7 profile. Even if qualitative judgments 
are tentative, this does not happen to the sane extent 
in RCL or BR. Neither the a nor the p predictions mark 
5 long segments here, and turn propensity peaks appear 
frequently along the chain. A tentative assignment of 
multi-p fold can be made at this point, if one then 
focuses attention on the U b7 peaks, one can verify that 
they mark the p-segments exceedingly well. At a 
10 threshold of 1.83, all strands will be marked, with 
minimal overprediction. Segments lengths could be 
further refined as above. 

The Escherichia coli porin profiles (Fig. 7e) show 
further the limitations of hydrophobicity analysis per 
15 se. The hydrophobicity profiles largely miss the p- 
hairpin between residues 35-65. However, the CFP p- 
predictions and U for p segments find them. The u 
peaks are especially noteworthy; as above, with a 
threshold of 2.15, u marks all the p-strands with 
20 minimal overprediction. One can note also repeated 
suprathreshold turn predictions, seemingly at regular 
intervals; from all this, a tentative assignment of 
multi-p structure may be made. This plot also allows 
the rare opportunity of evaluating the performance of 
55 the PHD robot by comparing the structure derived from 
crystallography with a prediction PHD made of this 
protein shortly before it was incorporated to its 
database. Practically all structured segments are 
detected by PHD, which also does reasonably well in 
0 predicting their lengths. Once more, those lengths are 
too short for transmembrane a-helices, but adequate for 
P-strands, confirming the tentative assignment above. 
There is another feature of the PHD prediction worth 
noting: as many as 7 p-s gments are predicted as o- 
5 helical, while one of the short o-helic s is predicted 
as a p-strand. Such types of mispredicti ns can be 
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common when humans mak their own judgments, so here 
the computer brings no improvement. The advantage a 
human has is to know that the protein is in a membrane, 
and hence that the structured segments predicted as a- 
5 helical are too short to be transmembrane, pointing 
instead to a P -barrel. 

In closing this section, we note that the group of 
proteins reviewed so far has a common feature: they 
tend to have relatively short sequences, not exceeding 
10 some 350 amino acids. Perhaps that has made 
crystallizing them somewhat easier, certainly 
predictions also appear relatively straightforward, 
compared with some for the longer sequences. 

15 Pfotains w ith unknown structure. 

The prediction profiles for facilitative glucose 
transporter indicate a number of short segments (Fig. 
8a) . PHD predicts only three long segments as a- 
helical. Yet, of these, the middle one (#230-260) 

20 forms part of a known long intracellular loop, and 
might be actually broken by a turn. In view of the 
number of predicted short segments, the remaining two 
long segments could not suffice to label the protein as 
multi-a-helical. When analyzed more closely, predicted 

25 turns can be discerned that could interrupt those 

segments. In contrast, the U b7 peaks, p predictions, 
and a good number of PHD-predicted segments are in 
register and give a cogent picture of short segments, 
the approximate location of which we mark with arrows. 

30 One can note how the segments may be nested between 
predicted turns. Partly on this basis, we have 
predicted for this protein a porin-like multi-p 
folding, with some o-helic s in the connecting loops. 
W also show in the fourth panel the p ssible 

35 orientation of the predicted p-strands. 
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CHIP28: PHD predicts (Fig. 8b) only two s gm nts 
long enough to be transmembrane as a-helices, which 
makes assignment as multi-helical somewhat doubtful. 
In addition, turn propensities are rather high and 
5 repeated along the chain, which speaks for short 

structured segments. The next feature to are the p- 
predictions, U p7 peaks and PHD-predicted P segments in 
register along the second half of the sequence. If one 
now returns to the two long segments, predicted turns 
10 are discernible that could break them (one supra, one 
sub-threshold) . Therefore, we assign the protein as 
multi-p, and mark the 16 putative segments that would 
give it a por in-type fold. In this view, given its 
short sequence there would be little in this structure 
15 aside from the barrel itself, since the connecting 
loops would be rather short (except perhaps for the 
secjment 110-140). One might think of it as a 
rudimentary or bare-bones channel protein. 

The acetylcholine receptor a subunit: The H 21 
20 profile (Fig. 8c) yields several hydrophobic stretches 
long enough to be transmembrane a-helices; these (Ml-4) 
have been recognized for years. One of the long 
stretches (M2) is under particular scrutiny as a firm 
candidate to line the channel (see Karlin A, 1991, 
25 "Explorations of the nicotinic acetylcholine receptor", 
The Harvey Lecture series 8J5: 71-107) for how the 
different subunits might join to form a channel.) 

On the other hand, in the profiles shown here, 
detail multiplies as one progresses from H 21 to the 
30 other ones. It seems particularly noteworthy that the 
CFp and U b7 propensities and PHD segment predictions are 
in register throughout the sequenc . The CFa and Hct7 
prop nsities are not, which gives a tentative 
indication of multi-p folding. We have marked with 
15 arrows some s gments as the putativ 16 p-strands of a 
por in f Id. In Akabas et al., 1992, Science, 258;307- 
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10 



15 



310 evid nee from cyst ine-substituted mutants led the 
authors to describe segment 248-254 as probably formxng 
a p-strand. Our prediction also finds (^-structure in 
that region. 

The potential participation of residues 1-200 xn 
forming a channel has been apparently neglected so far, 
presumably because in the current view, all five 
receptor subunits would join together and instead 
simply form a channel lined by their M2 segments. 
However, these two views are not necessarily mutually 
exclusive, as a comparison with porins may show. 
Porins consist of trimers in which each monomer forms 
its own channel at one end of the molecule. At the 
other end, however, the individual channels merge xnto 
one large opening for the trimer. One wonders whether 
other membrane proteins may show a similar arrangement 
in which channel-containing subunits in varying numbers 
join in to share their openings. For the acetylcholine 
receptor, it might explain both the clear evidence for 
a large opening facing the extracellular space and 
lined by the 5 monomers (Karlin, suees) , and the 
predictions for the stretch 1-200 if each subunit would 
form its own channel at their intracellular ends, all 
channels eventually merging. 

Lactose permease. Once more, the CFP and U b7 
profiles go in register, especially in the second half 
of the sequence (Fig. 8d) . That cannot be saxd of the 
CFa and Ua7 profiles, which again tentatively indicates 
multi-p folding. PHD predicts some 15 P segments, 
30 which reinforces this possibility, PHD also predicts 
five a-helices with length presumably sufficient to 
span the membrane (residues 7-26, 72-90, 194-223, 267- 
287, and 352-376). However, the segment 194-223 xs 
marked by several fusions in the top pan 1 as 
35 intrac llular, and appears as a hydrophilic r gion in 
H 21 , so it s ems logical to discard it. Th r maining 
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four long segm nts, even if helical, do not appear 
enough to form a transmembrane pore of the dimensions 
required for lactose permeation. Besides, all of them 
are potentially interrupted or shortened by 
5 suprathreshold turn predictions. In view of all this, 
multi-p folding appears a more logical choice. 

This conclusion goes counter to that drawn in 
several studies in which evidence for a multi-a-helical 
fold was presented (Kaback, 1992, Int. Rev. Cytol., 
10 137^:97-125). On the other hand, the possibility of 
extensive p-folding for the lac permease has been 
advanced before by Radding (Karlin, supra ) . More 
recently, the results of Calamia and Manoil, obtained 
from fusions, have been cited to support the topology 
15 of the 12-helix lac permease model (Kaback, supra ) and 
to support the idea that facilitators conforming to a 6 
+ 6 hydrophobicity profile are or-hcilical (Nikaido et 
al., 1992, Science, 258 :936-942) . Calamia and Manoil 
apparently selected the locations of their fusions for 
20 the limited aim of discriminating between the 12-helix 
and the 14 -helix lac permease models. The fact that 
their results support the 12-helix model says little 
(if anything) about whether a-helical folding is to be 
favored over an alternative such as the partial p- 
25 barrel fold proposed by Radding (Karlin, supra ) , or 

over a possible 16-p-strand porin fold. In fact, some 
of the findings of Calamia and Manoil may be taken as 
possible indications of p-folding. In their own words, 
"...it appears that 9-11 apolar membrane spanning 
30 segments can suffice to promote efficient alkaline 
phosphatase translocation across the membrane. ,f 
Another int rpretation might be that the transmembrane 
segments referred t would be 9-11 residu s in length, 
that is, too short to b -helical but quite of the 
35 correct length for a transmembran P -strand. In 

addition, the segment between fusi ns 9 and 10, each 
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one labeling residues as xtracellular, is long enough 
that the chain, if consisting of short p-strands, could 
have entered the cell and returned outside. Lastly, 
fusion 29 apparently labels a stretch as intracellular 
5 when an extracellular location was expected; the small 
increase in activity of fusion 29 appears of dubious 
significance in view of the fact that fusion 13, also 
of small but non-zero activity, may be labeling an 
intracellular location. In a similar vein, the 
10 observed range of alkaline phosphatase activities (as 
against an ideal all-or-none pattern) poses some 
question as to which locations the intermediate 
activities may be labeling. A more substantial link 
between results of fusions and topology may be clearer 
15 if control fusions and subsequent expression can be 
done with membrane proteins of known structure. 

sodiua-glucose cotransporter (Pig. 8e) , K channel 
(Pig. 8f ) . several of the patterns already referred to 
above reappear for these sequences. CFp, U p? and PHD 
20 predictions are in register, while those for CFa and U a7 
do not seem to be. Turn potentials rise regularly and 
delimit segments of 10 residues or less. Once more, a 
multi-p assignment seems plausible. We have marked 
with arrows segments that might contribute to porin 
25 folds. There is evidence that the functional unit for 
this K + channel is a tetramer (MacKinnon, 1991, Nature, 
350-232-235); the comments made above for the 
acetylcholine receptor apply here as well, namely, each 
monomer may have its own channel, with all four 
30 channels merging into one. 

calcium ATPase (Pig. 2g> , H + /K + -*TPase (Fig. 8h) . 
The length f the s quences does not allow an intuitive 
comparison on the same basis as w have done until now. 
Of course, th se proteins may contain homolog us 
35 internal r peats, and if so perhaps a more detailed 
analysis might be performed on ach rep at. For the 
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present purposes, we will simply call att ntion to the 
CFp, U b7 and PHD predictions in register, the number of 
comparatively short segments predicted (having the 
proper length for transmembrane p-strands) , and the 
5 regularity with which peaks appear in the <pt> profile, 
all of which is consistent with p-folding. 

Profile analysis of environmental similarity. We 
resorted to this recently developed methodology 
(Gribskow et al., 1990, "Profile analysis, in: r.f. 
10 Doolittle (eds). Methods in Enzymology", Academic 
Press, New York, pp. 146-159; Bowie et al., 1991, 
Science, 253:164-170). We chose as terms of comparison 
two environments, those of RCL and Ompf, and set out to 
investigate whether membrane proteins of interest would 
15 have environmental scores closer to one or the other. 
The results are summarized in Figs. 9a and 9b. For 
reasons we discuss below, we think this type of 
analysis does not perform optimally for membrane 
proteins, still, some trends are apparent. The Ompf 
profile (Fig. 9a) recognizes several porins and members 
of the major facilitator superfamily of proteins, a 
group that includes the sugar transporters, and gives 
them better scores than those of most globular 
unrelated proteins or BR. Conversely, the RCL profile 
(Fig. 9b) recognizes the RC M chain and BR better than 
facilitators or porins. 



20 



25 



7.3. Discussion 

Trans locators: economy of the barrel design. 

30 Since a main common function of transporters and 
channels is to allow passage ("translocation") of 
solutes across the membrane, in what follows w will 
refer to them as "translocators". Given a limited 
number of residues, a p-strand can span a membrane with 

35 much fewer of them (beginning with six (Rosenbusch, 

1985, EMBO J., 1:1593-1597); 10 is certainly adequate). 
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Hence, as already noted (Radding, 1991, J. Theor Biol., 
ISO: 239-249), much less residues are needed to con- 
figure a transmembrane translocation unit if the unit 
is a p-barrel than if the transmembrane segments are a- 
5 helical. 

The width of the barrel, and the role of the con- 
necting loops. When contemplating a possible p-barrel 
model for trans locators, it seems logical at first to 
focus on known p-barrel folds so as to determine which 
10 one might have a channel suited for translocation. The 
choice so far seems limited to two main types, the a-p 
barrels of isomerase-type enzymes (Farber et al. , 1990, 
TIBS, 15:228-234) and the porins. The p-barrel lumen 
of the 8-stranded isomerase fold, however, appears to 
15 be very small, perhaps only 1-2 angstroms. Of course, 
the pore of the 16-stranded porins is much wider; in 
ompf , even with a loop inside its pore and constricting 
it, its diameter is 7x11 angstroms (Cowan et al., 1992, 
Nature, 2Sfi: 727-733) . This is adeguate for large 
20 solutes, but appears excessive for ionic channels and 
transporters of small solutes. If such translocators 
have a porin fold, their pores may be modified by 
loops. Hence, some connecting loops in translocators 
may fulfill specific functions such as gating a 
25 channel, constricting a channel pore, binding to and 
hence selecting solutes, binding metabolites and 
cofactors, signaling destination in protein traffic, 
etc. Evolutionary, it seems easier to explain the 
development of translocators if a common translocation 
30 unit was conserved (a 16-stranded p-barrel) and 

different loops evolved for different functions. A 
similar scheme was advanced by Nikaido and Saier for 
bacterial facilitators, except that the transl cati n 
unit they envisaged was 12-o-helical (Nikaid et al., 
35 1992, Sci nc , 2SS:936-942). In our view, the common 
translocation unit would b a p-barrel. With this 
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proviso, the idea of a c nun n translocation unit could 
be extended to ionic channels (see Fig. 8f ) , with 
suitable loops evolutionarily grafted for each given 
protein (Nikaido, supra). In fact, a p-barrel model 
5 has been previously proposed for the voltage-activated 
K + channel (Bogusz et al., 1992, Protein-Eng. , 
5111:285-293) . 

For an alternative, one would have to consider the 
evolutionary development of translocators by a process 
10 that would have tailored the number of strands and 
hence the width of the channel to the size of the 
solute considered. Aside from being overly complex, 
that is not what the evidence points to for bacterial 
facilitators (Nikaido, supra) . m this light, we deem 
15 the work of Radding (Karlin, supra ) important to point 
out the possible presence of p structure in lac 
permease and the Na + /H + artiporter, a concept with which 
we agree (cf . our Figs. 9d for the lac permease and 9h 
for the H + /K + -ATPase) . On the other hand, the partial 
20 P -barrels that he proposes may be more difficult to 

marry with the evolutionary considerations above. From 
all this, the porin fold emerges as an interesting 
candidate for a template common to most if not all 
translocators . 

25 The connecting loops of barrels, in an anti- 

parallel p-barrel, the loops connecting one strand with 
the next one can be relatively short, sometimes no 
longer than needed for a turn. The arrangement has a 
certain symmetry in that each strand connects only with 

30 the neighboring ones, thereby decreasing potential 

steric conflicts between different loops. This is what 
happens in porins. In the view we propose, such loops 
would be crucial, since the transl eating unit made out 
of a p-barrel would be too static to result in, say, 

35 gating. Conformational changes associated with binding 
and/or selectivity ar also easier to conceive if they 
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35 



involve only loops, rather than massive prot in 
segments • 

One might also mention that finite water perme- 
ability through proteins has been shown to exist not 
only across water channels such as CHIP28 or y-TIP but 
across several transporters such as GLUT1, the 
sodium/glucose cotransporter , and CFTR (Hasegawa et 
al 1991, Science, 151:1477-1479). Water permeation 
could of course take place through any type of 
preferential pathway in a protein, but the presence of 
p-barrels acting as translocation pathways would 
provide a ready explanation for water passage through 

' transporters . 

Analysis of environmental scores in membrane 
proteins. The profile analysis methodology has been 
developed for globular proteins. Hence, in the way it 
currently stands, the side chains pointing outward from 
the protein are necessarily assumed to be exposed to 
water. By design, the profile program does not 
differentiate between globular and membrane proteins, 
in consequence, the side chains of membrane proteins 
projecting outward from the transmembrane segments 
would interact with the lipid membrane milieu, and 
ought to be considered buried, while the current 
algorithm may treat them as exposed. This trend can be 
gathered from the third panel in Fig. 8d, showing the 
environment of Shodobacter capsulatus porin. For a 
visual impression, we arbitrarily converted the six 
side-chain environment categories (Bl, B2, 
and E) into respective environmental "hydrophobicities 
ri-fraction polar) x (area buried)) using average 
values from Fig. 4 of (Bowie et al., 1991, Science 
2^:164-170). in principl , each consecutive residue 
17a transmembrane p-strand might be expected to show a 
clear alt rnation in environment with r spect to the 
prior one. Some limited alternation is det ct d for 
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the strands (panel 3, Fig. 7d) , but only rarely going 
into the high hydrophobicity region that would be 
expected for the bilayer environment. We believe that 
perhaps that is why the global scores we obtain in 
5 Figs. 9a and 9b are lower than those obtained for 
globular proteins, and why the algorithm does not 
separate the protein scores as it otherwise might. 
Still, even with limitations, the algorithm is 
promising in that it does some discrimination 
10 consistent with expectations. 

Functional possibilities for multimers. Anti- 
porters such as the H + /K + -ATPase, plus symports such as 
the Na + /K + /2C1* transporters pose as questions whether 
the multiply transported ions might share the same 
15 route through the protein, and how could that be, 

especially for ions of opposite charge. Consideration 
of the porin arrangement leads us to speculate that 
perhaps the paths for the individual species might be 
separate, after all; each species might traverse the 
20 channel of a different "repeat", each one having its 
own suitable selectivity. Merging of the channels 
might account somehow for the stoichiometry observed. 

Various publications are cited herein, the texts 
of which are hereby incorporated by reference in their 
25 entireties. 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Ala Glu lie Tyr Asn Lys Asp Gly Asn Lys Val Asp Leu Tyr Gly Lye 
15 10 15 

Ala Val Gly Leu His Tyr Arg Ser Lys Gly Asn Gly Glu Asn Ser Tyr 
20 25 30 

Gly Gly Asn Gly Asp Met Thr Tyr Ala Arg Leu Gly Phe Lys Gly Glu 
35 40 45 

Thr Gin He Asn Ser Asp Leu Thr Gly Tyr Gly Gin Trp Glu Tyr Asn 
50 55 60 

Phe Gin Gly Asn Asn Ser Glu Gly Ala Asp Ala Gin Thr Gly Asn Lys 
65 70 75 80 

Thr Arg Leu Ala Phe Ala Gly Leu Lys Tyr Ala Asp Val Gly Ser Phe 
85 90 95 

Asp Tyr Gly Arg Asn Tyr Gly Val Val Tyr Asp Ala Leu Gly Tyr Thr 
100 105 110 

Asp Met Leu Pro Glu Phe Gly Gly Asp Thr Ala Tyr Ser Asp Asp Phe 
115 120 125 

Phe Val Gly Arg Val Gly Gly Val Ala Thr Tyr Arg Asn Ser Asn Phe 
130 135 140 

Phe Gly Leu Val Asp Gly Leu Asn Phe Ala Val Gin Tyr Leu Gly Lys 
145 150 155 160 

Asn Glu Arg Asp Thr Ala Arg Arg Ser Asn Gly Asp Gly Val Gly Gly 
165 170 175 

.Ser He Ser Tyr Glu Tyr Asx Gly Phe Gly He Val Gly Ala Tyr Gly 
180 185 190 

Ala Ala Asp Arg Thr Asn Leu Gin Glu Ala Gin Pro Leu Gly Asn Gly 
195 200 205 

Lys Lys Ala Glu Gin Trp Ala Thr Gly Leu Lys Tyr Asp Ala Asn Asn 
210 215 220 

He Tyr Leu Ala Ala Asn Tyr Gly Glu Thr Arg Asn Ala Thr Pro He 
225 230 235 240 

Thr Asn Lys Phe Thr Asn Thr Ser Gly Phe Ala Asn Lys Thr Gin Asp 
245 250 255 

Val Leu Leu Val Ala Gin Tyr Gin Phe Asp Phe Gly Leu Arg Pro Ser 
260 265 270 

He Ala Tyr Thr Lys Ser Lys Ala Lys Asp Val Glu Gly He Gly Asp 
275 280 285 

Val Asp Leu Val Asn Tyr Ph Glu Val Gly Ala Thr Tyr Tyr Phe Asn 
290 295 300 

Lys Asn Met Ser Thr Tyr Val Asp Tyr He He Asn Gin He Asp Ser 
305 310 315 320 

Asp Asn Lys Leu Gly Val Gly Ser Asp Asp Thr Val Ala Val Gly II 
325 330 335 
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Thr Ala Val Asp Hie Lye Ala Tyr Gly Leu S r Val Asp Ser Thr Phe 
225 230 235 240 

Gly Ala Thr Thr Val Gly Gly Tyr Val Gin Val Leu Asp lie Asp Thr 
245 250 255 

He Asp Asp Val Thr Tyr Tyr Gly Leu Gly Ala Ser Tyr Asp Leu Gly 
260 265 270 

Gly Gly Ala Ser He Val Gly Gly He Ala Asp Asn Asp Leu Pro Asn 
275 280 285 

Ser Asp Asn Val Ala Asp Leu Gly Val Lys Phe Lys Phe 
290 295 300 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 492 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Human 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..492 

(C) OTHER INFORMATION: Facilitative glucose transporter 

Glutl protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Glu Pro Ser Ser Lys Lys Leu Thr Gly Arg Leu Met Leu Ala Val 
1 5 10 15 

Gly Gly Ala Val Leu Gly Ser Leu Gin Phe Gly Tyr Asn Thr Gly Val 
20 25 30 

He Asn Ala Pro Gin Lys Val He Glu Glu Phe Tyr Asn Gin Thr Trp 
35 40 45 

Val His Arg Tyr Gly Glu Ser He Leu Pro Thr Thr Leu Thr Thr Leu 
50 55 60 

Trp Ser Leu Ser Val Ala He Phe Ser Val Gly Gly Met He Gly Ser 
65 70 75 80 

Phe Ser Val Gly Leu Phe Val Aen Arg Phe Gly Arg Arg Asn Ser Met 
85 90 95 

Leu Met Met Asn Leu Leu Ala Phe Val Ser Ala Val Leu Met Gly Phe 
100 105 HO 

Ser Lys Leu Gly Lys Ser Phe Glu Met Leu He Leu Gly Arg Phe He 
115 120 125 

He Gly Val Tyr Cys Gly Leu Thr Thr Gly Phe Val Pro Met Tyr Val 
130 135 140 
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Claims 



A method of predicting the tendency of a protein 
to form an amphiphilic a structure, 

comprising calculating a series of values for 
V for a series of portions of the protein, each 
potion having a span of x residues, wherein the 
series of portions spans the protein, and wherein 
x is any integer, comprising calculating a value 
for Uo* using the equation U ax = H x + M«x " < ^ >t> ' 
wherein H x is the average hydrophobicity for a 
span of x residues using the Kyte-Doolitte scale, 
u is the hydrophobic moment (span x) for a 
s^uctures, the angle between one residue and the 
successive residue being 100°, and < P t> is the 
position dependent turn propensity, and further 
comprising depicting the values for V ax 
graphically to form a series of peaks, wherein 
peaks wide enough to correspond to a segment of 
the amino acid sequence long enough to span the 
membrane as an a-helix are predicted to be a 
structures . 

2 The method according to claim 1, using the source 
. code set forth in pages 12 - 18 of the 
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16 
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20 
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3 specification. 



1 
2 



Th e »ethod according to clai. 1. the source 

code set forth in pages 20 - 32 of the 



3 specification. 



1 4. 



The method according to claim 1, where x has a 



2 value of seven. 



1 5. 



Th method according to claim 2, where x has a 



2 value of seven. 
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1 6. The method according to claim 3, where x has a 

2 value of seven. 

1 7. The method according to claim 1, where x has a 

2 value of twenty-one. 

1 .8. The method according to claim 2, where x has a 

2 value of twenty-one. 

1 9. The method according to claim 3, where x has a 

2 value of twenty-one. 

1 10. A method of predicting the tendency of a protein 

2 to form an amphiphilic p structure, 

3 comprising calculating a series of values for 

4 E7p x for a series of portions of the protein, each 

5 portion having a span of x residues, wherein the 

6 series of portions spans the protein, and wherein 

7 x is any integer , comprising calculating a value 

8 for Up x using the equation 0p x = H x + Mp* - <pt>, 

9 wherein H x is the average hydrophobicity for a 

10 span of x residues using the Kyte-Doolitte scale, 

11 |ip x is the hydrophobic moment (span x) for p 

12 structures, the angle between one residue and the 

13 successive residue being 160° , and <pt> is the 

14 position dependent turn propensity, and further 

15 comprising depicting the values for C7p x 

16 graphically to form a series of peaks, wherein 

17 peaks that are too narrow to correspond to a 

18 segment of the amino acid sequence long enough to 

19 span the membran as an o-helix but which are wid 

20 enough to correspond to a s gment of the amino 

21 acid sequence with a length between 6 and 14 amino 

22 acid residues are predicted to be p structures. 



WO 96/18957 



-79- 



PCTA3S95/16126 



1 
2 



u The « thod according to claim 10. using the 
algorithm sot forth in pages 12 - 18 of the 
3 specification. 

, « The method according to claim 10. using the 
\ algorithm set forth in pages 20 - 32 o, the 

3 specification. 

1 „. The method according to claim 10. where x has a 

2 value of seven. 

x 14 . The method according to claim 11, where x has a 
2 value of seven. 

t 15 . The method according to claim 12, where x has a 
2 value of seven. 

x „. The method according to claim 10, where x has a 

value of twenty-one. 
, 17 . The method according to claim 11, where x has a 
value of twenty-one. 

! „. The method according to claim 12, where x has a 
2 value of twenty-one. 
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